import requests
url = 'https://www.xbiquge.la/58/58814/24298867.html'
headers = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Cookie':'Hm_lvt_169609146ffe5972484b0957bd1b46d6=1632444573,1632445219,1632445377,1632459924; Hm_lpvt_169609146ffe5972484b0957bd1b46d6=1632459933',
'Host':'www.xbiquge.la',
'Referer':'https://www.xbiquge.la/58/58814/',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'same-origin',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:92.0) Gecko/20100101 Firefox/92.0'}
html = requests.get(url,headers).content.decode('utf-8')
print(html)
可以获取到,因为这个网页中换行时,有的只用了回车符(\r)没有加换行符(\n)
在控制台打印时回车符(\r)与换行符(\n)效果是不一样的。
回车符(\r)在控制台打印中是把光标回到本行的开头,不会换行。
下一行的内容会覆盖本行已打印了的内容。
比如
print("abcd\ref")
制台打印
efcd
你把\r替换成\n 即可。
html = html.replace("\r","\n")
你题目的解答代码如下:(如有帮助,望采纳!谢谢! 点击我这个回答右上方的【采纳】按钮)
import requests
url = 'https://www.xbiquge.la/58/58814/24298867.html'
headers = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Cookie':'Hm_lvt_169609146ffe5972484b0957bd1b46d6=1632444573,1632445219,1632445377,1632459924; Hm_lpvt_169609146ffe5972484b0957bd1b46d6=1632459933',
'Host':'www.xbiquge.la',
'Referer':'https://www.xbiquge.la/58/58814/',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'same-origin',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:92.0) Gecko/20100101 Firefox/92.0'}
html = requests.get(url,headers).content.decode('utf-8')
html = html.replace("\r","\n")
print(html)
我直接访问你要爬取的网址,是访问不到的
检查一下你是否复制了正确的网址