爬虫,requests

img

为什么我使用requests获取网页源代码的时候出现的不是源代码,而是网址,点开网址以后才是我想要的内容,我使用的是pycharm社区版本

下面输出的不就是和html吗?乱码了看不到具体内容是什么。估计是被反扒了。

测试了相同的代码联通宽带被反扒,电信则正常。题主重启路由器换个ip试试

还有请求头accept,Referer之类的全部加上去,url请求网址增加&usm=3&rsv_idx=2&rsv_page=1参数


import requests

url='https://www.baidu.com/s?ie=UTF-8&wd=周杰伦&usm=3&rsv_idx=2&rsv_page=1'

dic={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
     'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
     'Accept-Encoding':'gzip, deflate, br',
     'Referer':'https://www.baidu.com/',
     'Accept-Language':'zh-CN,zh-TW;q=0.9,zh-HK;q=0.8,zh;q=0.7,en;q=0.6',
     'cookie':'BIDUPSID=47898BA6FC80B054D168CDFD5EA87B14; PSTM=1665470090; BAIDUID=C3A34FF5941A5F7026CD6E69E5C6A28D:SL=0:NR=10:FG=1; BD_UPN=12314753; BDUSS=Wk2N1BtZktJLVRsWGgyNmNrd2RyU0RXYlJ1ZGtUOXBkaXZlQXdsVEl-QzR4U3hrSUFBQUFBJCQAAAAAAAAAAAEAAACI8Ls3ysC959POzfgxNjgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALg4BWS4OAVkN; BDUSS_BFESS=Wk2N1BtZktJLVRsWGgyNmNrd2RyU0RXYlJ1ZGtUOXBkaXZlQXdsVEl-QzR4U3hrSUFBQUFBJCQAAAAAAAAAAAEAAACI8Ls3ysC959POzfgxNjgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALg4BWS4OAVkN; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDSFRCVID=dp0OJeC62RpdmRvfyRYbUUvHtPoRalJTH6aoqGQpGNiKwJ34RQgOEG0Phx8g0KAbRyJSogKKBgOTH4FF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF=tbAD_IIMtDK3HnRY-P4_-tAt2qoXetJyaR3bQIJvWJ5TMCo-K-OV0RKE2J5k5qQPbebJbn74KRRkShPC-tnaQp_QQPT0BbbZtbnNbR3v3l02Vbn9e-t2yU_V0lrZQ-RMW23G0h7mWIbPsxA45J7cM4IseboJLfT-0bc4KKJxbnLWeIJIjj6jK4JKDNAeJT3P; H_PS_PSSID=38185_36554_38105_38131_38439_37861_38170_38289_38380_37925_38312_38382_38285_38040_26350_38417_38283_37881; BAIDUID_BFESS=C3A34FF5941A5F7026CD6E69E5C6A28D:SL=0:NR=10:FG=1; BDSFRCVID_BFESS=dp0OJeC62RpdmRvfyRYbUUvHtPoRalJTH6aoqGQpGNiKwJ34RQgOEG0Phx8g0KAbRyJSogKKBgOTH4FF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF_BFESS=tbAD_IIMtDK3HnRY-P4_-tAt2qoXetJyaR3bQIJvWJ5TMCo-K-OV0RKE2J5k5qQPbebJbn74KRRkShPC-tnaQp_QQPT0BbbZtbnNbR3v3l02Vbn9e-t2yU_V0lrZQ-RMW23G0h7mWIbPsxA45J7cM4IseboJLfT-0bc4KKJxbnLWeIJIjj6jK4JKDNAeJT3P; BD_HOME=1; delPer=0; BD_CK_SAM=1; PSINO=7; BA_HECTOR=2k0h840424a18la02l0l846k1i1kr4p1m; channel=baidusearch; baikeVisitId=dcbc89d4-52a6-4b9f-a0bb-a85f8ee65668; ZFY=7aXZHJMnkSVSa1uI:AqRXQ:AUSt4eX6mkMYgQxZkOJKi4:C; B64_BOT=1; sug=3; sugstore=0; ORIGIN=2; bdime=0; H_PS_645EC=b428Vxu0iZX7iP6uksAmwhNbBXs%2Blz7ExHxUKjZa5SjnKtZpMYa5Cv%2FtNME; BDSVRTM=0'
     }

resp=requests.get(url,headers=dic)
resp.encoding='utf-8'

print(resp.text)
您好,我是有问必答小助手,您的问题已经有小伙伴帮您解答,感谢您对有问必答的支持与关注!
PS:问答VIP年卡 【限时加赠:IT技术图书免费领】,了解详情>>> https://vip.csdn.net/askvip?utm_source=1146287632