python中爬虫Traceback (most recent call last):

pytho是3.9.0.用了网上很多办法都没法解决,网上说是pytho版本过高
img

这个date数据返回的只是个response对象吧,需要使用read()获取具体内容的 findall 里面的写成 date.read()

反爬虫导致的,关键是添加header

import ssl
import urllib.request

context = ssl._create_unverified_context() # 解决https导致的 urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

url= "https://read.douban.com/provider/all"
header = {  #头部信息。解决反爬虫导致的 HTTPError(req.full_url, code, msg, hdrs, fp)
    'User-Agent':'Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
} 
request = urllib.request.Request(url, headers=header)
conn = urllib.request.urlopen(request, context=context) # <class 'http.client.HTTPResponse'>
data = conn.read()                                      # <class 'bytes'>
html =  data.decode("utf8")                             # <class 'str'>
print(html)