from bs4 import BeautifulSoup
【1】 urllib import request
wwwd="【2】://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book”#网页地址
res3=request.urlopen(【3】)#打开连接
soup = BeautifulSoup(res3,"html.parser") book33 =soup.find(attrs={"id":"book"!)
book 333=【4】.findAll(attrs={"class":"【5】"}) for
book34 in book 333:
print (【6】.string) print("数据爬取成功!")
对比下这几个问题,答案就出来了呀
from
https
wwwd
soup
title
book 333
现在请求都做了很多反爬措施,一般没有请求头的伪装是爬取不到任何数据的,而且还有变量名在命名的时候中间不要有空格
from bs4 import BeautifulSoup
from urllib import request #【1】--from
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36'
}
wwwd="https://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book" #网页地址 【2】--https
req = request.Request(url=wwwd,headers=headers)
res3 = request.urlopen(req)
# res3=request.urlopen(wwwd) #打开连接 【3】--wwwd
soup = BeautifulSoup(res3,"html.parser")
book33 =soup.find(attrs={"id":"book"})
book333=book33.findAll(attrs={"class":"title"}) #【5】--title
for book34 in book333:
print (book34.string) #【3】--book34
print("数据爬取成功!")
不知道你这个问题是否已经解决, 如果还没有解决的话: