์ฐธ๊ณ ํ ์ ํ๋ธ > https://youtu.be/yQ20jZwDjTE
1. ์คํฌ๋ํ์ ํ์ํ ํจํค์ง ์ค์น
BeautifulSoup, Requests ํจํค์ง๊ฐ ์ค์น๋์๋์ง ํ์ธํ๋ค.
pip list
๋ชฉ๋ก์ beautifulsoup4 , requests ํจํค์ง๊ฐ ์๋ค๋ฉด ์ค์นํด์ค๋ค.
pip install beautifulsoup4
pip install requests
+ ๊ตฌ๋ฌธ์ ๋ถ์ํด์ฃผ๋ parser์ค์น
pip install lxml
2. python ํ์ผ ์์ฑ
ํ์ฑ์ ์ํ ์ฝ๋๋ฅผ ์์ฑํ ํ์ด์ฌ ํ์ผ์ ์์ฑํ๋ค.
vi parser.py
3. HTML ๋ฐ์ดํฐ ๊ฐ์ ธ์ค๊ธฐ
ํ์ฑํด์ฌ ์น ์ฌ์ดํธ์์ ์ฐ๊ทน ๋ชฉ๋ก ์ค ์ฐ๊ทน๋ช ์ ๊ฐ์ ธ์ฌ ์ฝ๋๋ฅผ ์์ฑํด์ค๋ค.
import requests
from bs4 import BeautifulSoup
url = "https://search.shopping.naver.com/search/all?frm=NVSHTTL&origQuery=%EC%97%B0%EA%B7%B9&pagingIndex=1&pagingSize=40&productSet=total&query=%EC%97%B0%EA%B7%B9&sort=rel×tamp=&viewType=list"
res = requests.get(url)
res.raise_for_status() #๋ฌธ์ ๊ฐ ์์ผ๋ฉด ๋ฐ๋ก ์ข
๋ฃ
soup = BeautifulSoup(res.text, "lxml")
# ๋ค์ด๋ฒ ์ผํ > ์ฐ๊ทน ์ ์ฒด ๋ชฉ๋ก ๊ฐ์ ธ์ค๊ธฐ
items = soup.find_all(attrs={"class":"basicList_link__1MaTN"})
for i in range(len(items)):
print(i+1, items[i].get_text())
์ฝ๋๋ฅผ ์์ฑํด์ฃผ๊ณ , ์คํํ๋ฉด ์๋์ ๊ฐ์ด ์ฐ๊ทน ๋ชฉ๋ก ํ์ฑ์ ์ฑ๊ณตํ ๊ฒ์ ํ์ธํ ์ ์๋ค!!๐
๊ทธ๋ฐ๋ฐ ๋ค์ด๋ฒ ์ผํ์ 5๊ฐ๋ง ๊ฐ์ ธ์๋ค. ์ด๋ ์คํฌ๋กคํ ๋๋ง๋ค ์ ํ๋ค์ด 5๊ฐ์ฉ ๋์ค๊ฒ ๋๊ธฐ ๋๋ฌธ์ด๋ค.
4. json ๋ฐ์ดํฐ ๊ฐ์ ธ์ค๊ธฐ
- 5๊ฐ์ ๊ฒฐ๊ณผ๋ง ๋์ค๋ ๊ฒ์ ํด๊ฒฐํ๊ธฐ ์ํด์ , documentํ์์ ํ์ผ ๋์ jsonํ์์ ํ์ผ์ด ํ์ํ๋ค. (๋ค์ด๋ฒ ์ผํ์ ๊ฒฝ์ฐ, 2ํ์ด์ง๋ก ๋์ด๊ฐ์ผ ํด๋น jsonํ์ผ์ด ๋ณด์ธ๋ค.)
- ์ ํ์ ์ด๋ฆ๊ณผ ๊ฐ๊ฒฉ์ ํด๋นํ๋ ๋ฐ์ดํฐ๊ฐ ์ด๋์ ์๋์ง ํ์ธํด ์ฃผ์ด์ผ ํ๋ค!! (shoppingResult > products > ์ธ๋ฑ์ค(0~39) > price์ productTitle)
- jsonํ์ผ์ cURL(bash)๋ก ์นดํผํ์ฌ ๋ณํ ์ฌ์ดํธ์์ python์ฝ๋๋ก ๋ณํํด์ค๋ค.
- ์ฝ๋๋ฅผ ์๋์ ๊ฐ์ด ๋ณ๊ฒฝํด์ค๋ค.
import requests
from bs4 import BeautifulSoup
import json
headers = {
'authority': 'search.shopping.naver.com',
'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"',
'accept': 'application/json, text/plain, */*',
'sec-ch-ua-mobile': '?1',
'logic': 'PART',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Mobile Safari/537.36',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://search.shopping.naver.com/search/all?query=%EC%97%B0%EA%B7%B9&frm=NVSHATC&prevQuery=%EB%8C%80%ED%95%99%EB%A1%9C%20%EC%97%B0%EA%B7%B9',
'accept-language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
'cookie': 'NNB=GJPQMV2O2P4GA; _ga=GA1.2.1943308185.1627909138; nx_ssl=2; AD_SHP_BID=28; _shopboxeventlog=false; page_uid=hdzwRlp0JxCssZ9/XqwssssssKs-015485; SHP_BID=4; spage_uid=hdzwRlp0JxCssZ9%2FXqwssssssKs-015485; BMR=s=1628238052835&r=https%3A%2F%2Fm.blog.naver.com%2Fdororong97%2F222060637228&r2=https%3A%2F%2Fm.blog.naver.com%2FPostView.naver%3FisHttpsRedirect%3Dtrue%26blogId%3Ddororong97%26logNo%3D222063604094; listOffset=4; sus_val=PvkquAEXtN7QiMq8cS2jZ7Su',
}
params = (
('sort', 'rel'),
('pagingIndex', '2'),
('pagingSize', '40'),
('viewType', 'list'),
('productSet', 'total'),
('deliveryFee', ''),
('deliveryTypeValue', ''),
('frm', 'NVSHATC'),
('query', '\uC5F0\uADF9'),
('origQuery', '\uC5F0\uADF9'),
('iq', ''),
('eq', ''),
('xq', ''),
)
response = requests.get('https://search.shopping.naver.com/api/search/all', headers=headers, params=params)
result_dict = json.loads(response.text)
product_data = result_dict['shoppingResult']['products']
for i in product_data:
try:
product = i['productTitle']
price = i['price']
# print(product, price)
smart_farm_data = {
'product': product,
'price': price
}
print(smart_farm_data)
except:
pass
ํ์ผ์ ์คํํด์ฃผ๋ฉด ์๋์ ๊ฐ์ด 40๊ฐ์ ๊ฒฐ๊ณผ๊ฐ ์ถ๋ ฅ๋๋ ๊ฒ์ ํ์ธํ ์ ์๋ค. ๐
'๐ก์น ํ๋ก์ ํธ > (ํ์คํ) MOVIEW ์ฌ์ดํธ' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
webpack์ผ๋ก Django - Vue.js ์ฐ๋ํ๊ธฐ (0) | 2021.08.22 |
---|---|
Web Scraping์ด๋? (0) | 2021.08.06 |