๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ’ก์›น ํ”„๋กœ์ ํŠธ/(ํ’€์Šคํƒ) MOVIEW ์‚ฌ์ดํŠธ

๋„ค์ด๋ฒ„ ์‡ผํ•‘ ์•„์ดํ…œ ๋ชฉ๋ก ๊ฐ€์ ธ์˜ค๊ธฐ

์ฐธ๊ณ ํ•œ ์œ ํŠœ๋ธŒ > https://youtu.be/yQ20jZwDjTE

 

 

1. ์Šคํฌ๋ž˜ํ•‘์— ํ•„์š”ํ•œ ํŒจํ‚ค์ง€ ์„ค์น˜

BeautifulSoup, Requests ํŒจํ‚ค์ง€๊ฐ€ ์„ค์น˜๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•œ๋‹ค.

pip list

๋ชฉ๋ก์— beautifulsoup4 , requests ํŒจํ‚ค์ง€๊ฐ€ ์—†๋‹ค๋ฉด ์„ค์น˜ํ•ด์ค€๋‹ค.

pip install beautifulsoup4
pip install requests

+ ๊ตฌ๋ฌธ์„ ๋ถ„์„ํ•ด์ฃผ๋Š” parser์„ค์น˜

pip install lxml

 

2. python ํŒŒ์ผ ์ƒ์„ฑ

ํŒŒ์‹ฑ์„ ์œ„ํ•œ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ํŒŒ์ด์ฌ ํŒŒ์ผ์„ ์ƒ์„ฑํ•œ๋‹ค.

vi parser.py

 

3. HTML ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

ํŒŒ์‹ฑํ•ด์˜ฌ ์›น ์‚ฌ์ดํŠธ์—์„œ ์—ฐ๊ทน ๋ชฉ๋ก ์ค‘ ์—ฐ๊ทน๋ช…์„ ๊ฐ€์ ธ์˜ฌ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์ค€๋‹ค.

import requests
from bs4 import BeautifulSoup

url = "https://search.shopping.naver.com/search/all?frm=NVSHTTL&origQuery=%EC%97%B0%EA%B7%B9&pagingIndex=1&pagingSize=40&productSet=total&query=%EC%97%B0%EA%B7%B9&sort=rel&timestamp=&viewType=list"
res = requests.get(url)
res.raise_for_status() #๋ฌธ์ œ๊ฐ€ ์žˆ์œผ๋ฉด ๋ฐ”๋กœ ์ข…๋ฃŒ

soup = BeautifulSoup(res.text, "lxml")

# ๋„ค์ด๋ฒ„ ์‡ผํ•‘ > ์—ฐ๊ทน ์ „์ฒด ๋ชฉ๋ก ๊ฐ€์ ธ์˜ค๊ธฐ
items = soup.find_all(attrs={"class":"basicList_link__1MaTN"})
for i in range(len(items)):
    print(i+1, items[i].get_text())

์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์ฃผ๊ณ , ์‹คํ–‰ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์—ฐ๊ทน ๋ชฉ๋ก ํŒŒ์‹ฑ์— ์„ฑ๊ณตํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค!!๐Ÿ‘

๊ทธ๋Ÿฐ๋ฐ ๋„ค์ด๋ฒ„ ์‡ผํ•‘์„ 5๊ฐœ๋งŒ ๊ฐ€์ ธ์™”๋‹ค. ์ด๋Š” ์Šคํฌ๋กคํ•  ๋•Œ๋งˆ๋‹ค ์ œํ’ˆ๋“ค์ด 5๊ฐœ์”ฉ ๋‚˜์˜ค๊ฒŒ ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

4. json ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

  • 5๊ฐœ์˜ ๊ฒฐ๊ณผ๋งŒ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„ , documentํ˜•์‹์˜ ํŒŒ์ผ ๋Œ€์‹  jsonํ˜•์‹์˜ ํŒŒ์ผ์ด ํ•„์š”ํ•˜๋‹ค. (๋„ค์ด๋ฒ„ ์‡ผํ•‘์˜ ๊ฒฝ์šฐ, 2ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€์•ผ ํ•ด๋‹น jsonํŒŒ์ผ์ด ๋ณด์ธ๋‹ค.)
  • ์ œํ’ˆ์˜ ์ด๋ฆ„๊ณผ ๊ฐ€๊ฒฉ์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค!! (shoppingResult > products > ์ธ๋ฑ์Šค(0~39) > price์™€ productTitle)

  • jsonํŒŒ์ผ์„ cURL(bash)๋กœ ์นดํ”ผํ•˜์—ฌ ๋ณ€ํ™˜ ์‚ฌ์ดํŠธ์—์„œ python์ฝ”๋“œ๋กœ ๋ณ€ํ™˜ํ•ด์ค€๋‹ค. 

  • ์ฝ”๋“œ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ณ€๊ฒฝํ•ด์ค€๋‹ค.
import requests
from bs4 import BeautifulSoup

import json

headers = {
    'authority': 'search.shopping.naver.com',
    'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"',
    'accept': 'application/json, text/plain, */*',
    'sec-ch-ua-mobile': '?1',
    'logic': 'PART',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Mobile Safari/537.36',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://search.shopping.naver.com/search/all?query=%EC%97%B0%EA%B7%B9&frm=NVSHATC&prevQuery=%EB%8C%80%ED%95%99%EB%A1%9C%20%EC%97%B0%EA%B7%B9',
    'accept-language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
    'cookie': 'NNB=GJPQMV2O2P4GA; _ga=GA1.2.1943308185.1627909138; nx_ssl=2; AD_SHP_BID=28; _shopboxeventlog=false; page_uid=hdzwRlp0JxCssZ9/XqwssssssKs-015485; SHP_BID=4; spage_uid=hdzwRlp0JxCssZ9%2FXqwssssssKs-015485; BMR=s=1628238052835&r=https%3A%2F%2Fm.blog.naver.com%2Fdororong97%2F222060637228&r2=https%3A%2F%2Fm.blog.naver.com%2FPostView.naver%3FisHttpsRedirect%3Dtrue%26blogId%3Ddororong97%26logNo%3D222063604094; listOffset=4; sus_val=PvkquAEXtN7QiMq8cS2jZ7Su',
}

params = (
    ('sort', 'rel'),
    ('pagingIndex', '2'),
    ('pagingSize', '40'),
    ('viewType', 'list'),
    ('productSet', 'total'),
    ('deliveryFee', ''),
    ('deliveryTypeValue', ''),
    ('frm', 'NVSHATC'),
    ('query', '\uC5F0\uADF9'),
    ('origQuery', '\uC5F0\uADF9'),
    ('iq', ''),
    ('eq', ''),
    ('xq', ''),
)

response = requests.get('https://search.shopping.naver.com/api/search/all', headers=headers, params=params)

result_dict = json.loads(response.text)
product_data = result_dict['shoppingResult']['products']

for i in product_data:
    try:
        product = i['productTitle']
        price = i['price']
        # print(product, price)
        smart_farm_data = {
            'product': product,
            'price': price  
            }
        print(smart_farm_data)
    except:
        pass

 

ํŒŒ์ผ์„ ์‹คํ–‰ํ•ด์ฃผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด 40๊ฐœ์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ถœ๋ ฅ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๐Ÿ‘