爬取网页使用utf-8还是得到乱码



```python
import requests
res=requests.get('https://zhidao.baidu.com/daily/view?id=241570')
res.encoding='utf-8'
print(res.text)

```

utf-8换成gbk.

你看它的网页用的是什么编码，你就用什么编码。
浏览器F12查看源代码，看到html头部有
。。。里面写的很清楚是gbk。

<meta http-equiv="content-type" content="text/html;charset=gbk" />

页面的编码是gbk的，应该设置
res.encoding='gbk'
查看源代码，可以看到页面的编码是gbk

其实遇到这种获取内容中文乱码时，应该用utf-8和gbk都试一试，现在一般的页面都是这两种编码。

用apparent_encoding会从网页的内容中分析网页编码的方式，所以apparent_encoding比encoding更加准确。将代码中res.encoding =‘UTF-8'改成 res.encoding=res.apparent_encoding即可。

import requests
res = requests.get('https://zhidao.baidu.com/daily/view?id=241570')
res.encoding = res.apparent_encoding
print(res.text)

使用gzip解压，压缩格式有误

参考我的文章即可