爬取网页的url和电影名称
import re
url = 'https://www.1905.com/vod/?=0.2791749943538798%27
data = {
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
resp = requests.get(url,headers=headers)
nei_rong = resp.text
把nei_rong这行改为这样:
nei_rong = resp.content.decode('utf8')
request请求后加一句转编码res.encoding = 'utf-8'
爬虫乱码现象很常见,这篇博文里的三种方法可以试一试,很合适,最常见的就是转换成二进制,或者encoding编码一下!
https://blog.csdn.net/xx_nm98/article/details/123191514
有帮助的话采纳一下哦
resp = requests.get(url,headers=headers)
resp.encoding='utf-8'
nei_rong = resp.text