requests, beautifulsoup4 모듈

python

requests, beautifulsoup4 모듈

blackbearwow 2022. 5. 12. 21:13

requests와 beautifulsoup4 모듈을 이용한다면 웹크롤링을 쉽게 할 수 있다.

-requests 모듈-

1. 설치방법

cmd or powershell or terminal에 다음 명령어를 친다.

pip install requests

2. get 요청방법

해당 url에 직접 url='https://www.tistory.com/?param1=value&param2=value' 라고 치는 것보다 prarms딕션어리를 이용해 전달하는것이 좋은것 같다.

import requests 
URL = 'http://www.tistory.com' 
params = {'param1': 'value1', 'param2': 'value'} 
response = requests.get(URL, params=params) 
print(response.status_code)
print(response.text)
print(response.url)

3. post 요청방법

params대신 data를 써주면 된다.

import requests 
URL = 'http://www.tistory.com' 
data = {'param1': 'value1', 'param2': 'value'} 
res = requests.post(URL, data=data)
print(res.status_code)
print(res.text)

4. 헤더, 쿠키 설정

헤더는 headers, 쿠키는 cookies를 설정하면 된다.

headers = {'Content-Type': 'application/json; charset=utf-8'} 
cookies = {'session_id': 'sorryidontcare'} 
res = requests.get(URL, headers=headers, cookies=cookies)

5. 쿠키 받아오기

쿠키를 받아오는 방법: set-cookie항목을 추출한다.

https://stackoverflow.com/questions/25091976/python-requests-get-cookies

import requests
session = requests.Session()
response = session.get('http://google.com')
print(session.cookies.get_dict())

-beautifulsoup4 모듈-

1.설치방법

pip install beautifulsoup4

2. 태그 선택

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.naver.com")

bs = BeautifulSoup(res.content, "html.parser")
h3 = bs.select("h3") #h3태그 선택
h3_a = bs.select("h3 > a") #h3태그의 자식a태그 선택
div_current_box = bs.select("div.current_box") #div태그의 current_box클래스 선택
_title = bs.select(".title") #title클래스 선택
_u_skip = bs.select("#u_skip") #u_skip아이디 선택
selecter1 = bs.find_all("div", {"class": "partner_box"}) #div중 partner_box크래스 전부 선택
selecter2 = bs.find("div", {"class": "partner_box"}) #div중 partner_box크래스 하나 선택


print(h3)

3. xml 파싱

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.google.com/finance/historical')
bs = BeautifulSoup(res.content, features='xml')

저작자표시 (새창열림)

'python' 카테고리의 다른 글

selenium v4.0 (python) (0)	2023.04.10
파이썬 sorted 정렬 조건 정해주기 (0)	2022.07.08
파이썬 리스트 인덱스 여러개 찾기 (0)	2022.06.30
파이썬 공식문서 (내장함수 정의 보는법) (0)	2022.05.15
가변 매개변수, 키워드 매개변수 (0)	2022.05.15

현재글requests, beautifulsoup4 모듈

blackbearwow

struct bit field, c언어, socket.io, 프로그래머스, SQL, nodejs, MySQL, VRCHAT, level1, Set, vcc, lockstep, Nintendo, TCP header format, JavaScript, datastructures, Unity, heap tree, VRChat Creator Companion, 자료구조,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

blackbearwow

requests, beautifulsoup4 모듈

-requests 모듈-

1. 설치방법

2. get 요청방법

3. post 요청방법

4. 헤더, 쿠키 설정

5. 쿠키 받아오기

-beautifulsoup4 모듈-

1.설치방법

2. 태그 선택

3. xml 파싱

'python' 카테고리의 다른 글

'python'의 다른글

티스토리툴바

requests, beautifulsoup4 모듈

-requests 모듈-

1. 설치방법

2. get 요청방법

3. post 요청방법

4. 헤더, 쿠키 설정

5. 쿠키 받아오기

-beautifulsoup4 모듈-

1.설치방법

2. 태그 선택

3. xml 파싱

'python' 카테고리의 다른 글

'python'의 다른글

관련글

티스토리툴바