我的目的是爬取豆瓣电影排行榜中的电影名称,评分,多少人评价
这个是网页的源代码
import requests
import re
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.3"
}
url ="https://movie.douban.com/chart"
indes=requests.get(url,headers=headers)
page_concert=indes.text
obj=re.compile(r'<table.*?<a class="nbg".*?title="(?P<title>.*?)">'
r'.*?<span class="rating_nums">(?P<score>.*?)</span>',re.S)
result=obj.finditer(page_concert)
for i in result:
print(i.group("title"))
print(i.group("score"))
能运行下面是运行截图
<span class="pl">(?P<comment>.*?)</span>
下面是全部完整的代码
# 拿到页面源代码 requests
# 通过re 来提取有效的信息 re
import requests
import re
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.3"
}
url ="https://movie.douban.com/chart"
indes=requests.get(url,headers=headers)
page_concert=indes.text
obj=re.compile(r'<table.*?<a class="nbg".*?title="(?P<title>.*?)">'
r'.*?<span class="rating_nums">(?P<score>.*?)</span>'
r'<span class="pl">(?P<comment>.*?)</span>',re.S)
result=obj.finditer(page_concert)
for i in result:
print(i.group("title"))
print(i.group("score"))
print(i.group("comment"))
但是运行结果是这样
obj = re.compile(r'<table.*?<a class="nbg".*?title="(?P<title>.*?)">'
r'.*?<span class="rating_nums">(?P<score>.*?)</span>'
r'.*?<span class="pl">(?P<comment>.*?)</span>', re.S)