beautiful soup 웹크롤링 제ㅔ발 도와주세요..

본문 바로가기

beautiful soup 웹크롤링 제ㅔ발 도와주세요..

작성일 2023.08.14댓글 1건

게시물 수정 , 삭제는 로그인 필요

도서 사이트에서 도서명, 저자명, 이미지 링크를 차례대로 추출해서 엑셀 파일로 만들려고하는데요ㅠ 도서명이랑 저자명까지는 이렇게 해서 성공했어요

url = "https://www.yes24.com/24/category/bestseller?CategoryNumber=001001002006&sumgb=03" #생명과학

html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, 'html.parser')

book_list = []

book_info_list = soup.select('td.goodsTxtInfo')

for book_info in book_info_list:

book_name = book_info.select_one('p > a')

author_name = book_info.select_one('div > a')

if book_name and author_name:

book_text = book_name.text.strip()

author_text = author_name.text.strip()

if book_text and author_text:

book_list.append({'도서명': book_text, '저자': author_text})

df = pd.DataFrame(book_list)

df.to_excel('book_list_생명과학ver.xlsx', index=False, header=True)

그리고나서 도서 이미지 링크 추출하려고

from bs4 import BeautifulSoup

import urllib.request

import pandas as pd

url = "https://www.yes24.com/24/category/bestseller?CategoryNumber=001001002006&sumgb=03" # 생명과학

html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, 'html.parser')

book_list = []

book_info_list = soup.select('td.goodsTxtInfo, div.goods_img > a > img')

for book_info in book_info_list:

book_name = book_info.select_one('p > a')

author_name = book_info.select_one('div > a')

book_image = book_info.select_one('div.goods_img > a > img[src]')

if book_name and author_name and book_image:

book_text = book_name.text.strip()

author_text = author_name.text.strip()

image_url = book_image.get('src')

book_list.append({'도서명': book_text, '저자': author_text, '이미지 URL': image_url}) # 이미지 URL 추가

df = pd.DataFrame(book_list)

df.to_excel('book_list_생명과학ver.xlsx', index=False, header=True)

코드를 이렇게 수정하니까 엑셀 파일안에 아무것도 안 떠요... 도대체 뭐가 잘못된걸까요ㅠㅠㅠㅠㅠㅠ제발도와주세요ㅛㅠㅠㅠㅠㅠㅠㅠ 지피티가 수정해주는대로 해봤는데도 엑셀 파일안에 아무것도 안 떠요..ㅠㅠㅠㅠ 바로 채택해드릴게요 도와주세요 ㅠ.ㅠ

**참고로 이건 도서 사이트 html입니다.. 오른쪽코드가 1번 책이랑 관련된거예요 <tr>이 책 개수만큼 있어요

#beautiful soup #beautiful soup 사용법 #beautifulsoup 설치 #beautiful soup documentation #beautiful soup find_all #beautiful soup select #beautifulsoup4 #beautiful soup selenium #beautiful soup xpath #beautiful soup get text

익명 작성일 -

도움을 드릴 수 있습니다. 아래의 코드를 사용하여 도서명, 저자명, 이미지 링크를 추출하고 엑셀 파일로 저장할 수 있습니다.

```python

import urllib.request

from bs4 import BeautifulSoup

import pandas as pd

예스24 | 베스트셀러

www.yes24.com

html = urllib.request.urlopen(url).read()

soup = BeautifulSoup(html, 'html.parser')

book_list = []

book_info_list = soup.select('td.goodsTxtInfo')

for book_info in book_info_list:

book_name = book_info.select_one('p:nth-child(1) > a').text.strip()

author = book_info.select_one('p:nth-child(2) > a').text.strip()

image_link = book_info.select_one('a > img')['src']

book_list.append([book_name, author, image_link])

df = pd.DataFrame(book_list, columns=['도서명', '저자명', '이미지 링크'])

df.to_excel('도서목록.xlsx', index=False)

```

위의 코드는 `urllib.request`와 `BeautifulSoup`를 사용하여 웹 페이지를 가져오고 파싱하는 부분입니다. 그리고 `pandas`를 사용하여 추출한 정보를 데이터프레임으로 변환하고 엑셀 파일로 저장하는 부분입니다.

코드를 실행하면 현재 페이지에서 도서명, 저자명, 이미지 링크를 추출하여 도서목록.xlsx라는 이름의 엑셀 파일로 저장됩니다. 필요에 따라 파일 이름을 변경하십시오.

추가로 필요한 패키지가 설치되어 있지 않다면 `pip install` 명령을 사용하여 설치해야 합니다. 예를 들어, `pandas`를 설치하려면 `pip install pandas`를 실행하십시오.

답변확정 해주시면 정말 감사하겠습니다!.

beautiful soup 웹크롤링 제ㅔ발...

... read() soup = BeautifulSoup(html, 'html.parser') book... 도와주세요ㅛㅠㅠㅠㅠㅠㅠㅠ 지피티가... request`와 `BeautifulSoup`를 사용하여 웹...

웹크롤링, html, 코딩질문

크롤링 하려고 Beautiful Soup를...

제가 프로그래밍 초짜라,,,, 크롤링 하려고 Beautiful Soup를 받았는데... 아래 제 블로그에 방문해 관련된 글을 참고하시기 바랍니다 :D https://dinolabs....

파이썬 크롤링 좀 도와주세요

... com/tables 이 사이트의 테이블을 크롤링 해와서 각... BeautifulSoup # 웹 페이지 URL page = urllib.request.urlopen("https://www.premierleague.com/tables") soup...

Beautiful soup관련 질문이요 ㅠㅠ

beautiful soup도 경로 Path 설정도 해주어야 하나요? Beautiful... ※ 질문주신 내용 이외에도, 웹크롤링과 관련하여 궁금한 것들이 있다면, 아래 제 블로그에...

웹크롤링 BS4 코드 도와주세요 ㅠㅠ

... 웹크롤링을 배워보고자 공부하고 있습니다. 현재 네이버 BSET 100으로... urlopen(url) soup = BeautifulSoup(source_code, 'html.parser') li = soup.find_all('div...

cmd창 beautiful soup 설치...

크롤링 하려고 Beautiful Soup를 다운 받았는데 cmd창에는... ※ 질문주신 내용 이외에도, 웹 크롤링과 관련하여 궁금한 것들이 있다면, 아래 제 블로그에...

크롤링, 웹크롤링

크롤링 셀레니움 도와주세요 ㅠ

제가 인스타그램 크롤링할려고합니다 일단 인스타그램 특정검색을 하면 그... html = driver.page_source soup = BeautifulSoup(html) insta = soup.select('._1XyCr.eTsBx') 일단...

파이썬 크롤링 질문드립니다 ㅠㅠ

... get(link, headers=headers) print(res) soup = BeautifulSoup... 어디가 문제인지 도와주세요 ㅠㅠ 설명도... 정말 크롤링을 해야겠다 싶으면 셀레니움...

python, 크롤링

파이썬 웹크롤링

... text soup = BeautifulSoup(html, 'html.parser') table = soup.find('table... 도와주세요 [code] # parser.py import requests from bs4 import BeautifulSoup...

크롤링 관련 질문이요 ㅠㅠ

... 크롤링을 하기 위해서 beautiful soup 와 selenium을 설치하는게 좋은데요... 아래 제 블로그에 방문해 관련된 글을 참고하시기 바랍니다 :D https://dinolabs.tistory....