初学requests库 vscdoe编译器
import requests
url = 'https://www.baidu.com/s?%27
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {'wd':'北京'}
#url 请求资源路径
#params 参数
#kwargs 字典
response = requests.get(url=url,params=data,headers=headers)
content = response.text
print(content)
html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<title>ç¾åº¦å®å
¨éªè¯title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
<meta name="format-detection" content="telephone=no, email=no">
<link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
<link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
<link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />
head>
<body> è¯div>
<button type="button" class="timeout-button">è¿å页button>
div>iv class="timeout-img">div> ç»å¼è¯·ç¨
<div class="timeout-feedback hide">ä¸
<div class="timeout-feedback-icon">p>
div> class="timeout-feedback-title">é®é¢
<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js">script>
<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_eac1ee5.js">script>
body>
html>
为何为中文为乱码 不是说requests不需要编码吗
获取正常源代码
设置一下编码集就行了:
import requests
url = 'https://www.baidu.com/s?%27'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {'wd':'北京'}
#url 请求资源路径
#params 参数
#kwargs 字典
response = requests.get(url=url,params=data,headers=headers)
content = response.content.decode('utf8')
print(content)
import requests
url = 'https://www.baidu.com/s?%27'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {'wd': '北京'}
# url 请求资源路径
# params 参数
# kwargs 字典
response = requests.get(url=url, params=data, headers=headers)
# 设置编码
response.encoding = 'utf-8'
content = response.text
print(content)
原因: 编码的问题, 因为没有指定编码,默认的编码是 ISO-8859-1
, 所以中文显示乱码,解决方案, 见代码
import requests
url = 'https://www.baidu.com/s?%27'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {'wd':'北京'}
#url 请求资源路径
#params 参数
#kwargs 字典
response = requests.get(url=url,params=data,headers=headers)
# 方式一
content = response.content.decode('utf-8')
# print(content)
# 方式 2 修改默认的编码 为 utf-8
# 查看默认的编码
print(response.encoding)
response.encoding = 'utf-8'
print(response.text)
指定一下编码方式
import requests
url = "https://www.baidu.com/s?%27"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {'wd': '北京'}
# url 请求资源路径
# params 参数
# kwargs 字典
response = requests.get(url=url, params=data, headers=headers)
response.encoding = response.apparent_encoding # 程序自己推断编码方式 也可以指定utf-8
content = response.text
print(content)
这篇文章:python 解决requests中文乱码 也许有你想要的答案,你可以看看