目标网站:知乎
Pyrhon版本:3.6
agent = 'Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0'
header = {
'HOST': 'www.zhihu.com',
'Referer': 'https://www.zhihu.com',
'User-Agent': agent
}
response = requests.get('https://www.zhihu.com', headers=header)
报错内容:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Anaconda3\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Anaconda3\lib\site-packages\requests\sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "C:\Anaconda3\lib\site-packages\requests\sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "C:\Anaconda3\lib\site-packages\requests\adapters.py", line 438, in send
timeout=timeout
File "C:\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Anaconda3\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Anaconda3\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Anaconda3\lib\http\client.py", line 1280, in _send_request
self.putheader(hdr, value)
File "C:\Anaconda3\lib\http\client.py", line 1212, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2026' in position 30: ordinal not in range(256)
\u2026好像是空格,看了下源码,好像是空格要转latin-1这种编码失败了,
我上面有空格的是agent,要用来伪装浏览器,这可怎么办,
而且在我看的那个知乎爬虫视频中,他也是有空格的,却不出错,这是我的环境有问题吗,
请大佬们能指点一下
头有问题啊!'Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0'不知道怎么会出现...这种符号!自己去复制浏览器的user-agent
你的用户代理agent是用户未登录时的,而下面get的URL是登录后的URL。你先登录知乎,之后重新获得user-agent,就可以了。如果是模拟用户登录,你可以在网上找另外一些资料
agent中的字符串是不是都是ASCII字符。有没有其他字符
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0
在浏览器F12复制时代码没全打开,没复制完全,多了...这个符号,应该是这个导致的编码失败