代码无法实现网页内容爬取
绿色圈出内容为想要的标题和网页
红色圈为文中代码路径
希望各位巨佬看下哪里出错了
import requests
from lxml import etree
import xpath
baseurl="https://www.91mjw.cc/meiju/1.html"
head={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
"cookie": "__music_index__=2; count_h=7; count_m=1; first_m=1632399441828; count_h_kp=7; count_m_kp=1; first_m_kp=1632399441834; kglshhhhudd=18731%2C0; UM_distinctid=178cf37c472392-01b05182e14a8f-c3f3568-1fa400-178cf37c4735a0; CNZZDATA1279766435=960012558-1618381794-https%253A%252F%252Fwww.baidu.com%252F%7C1618381794; CNZZDATA1279766437=1854628013-1618382786-https%253A%252F%252Fwww.baidu.com%252F%7C1618382786; CNZZDATA1273438972=1848411145-1618381865-https%253A%252F%252Fwww.baidu.com%252F%7C1618381865; PHPSESSID=rud9qtr1bsdijibhb2fpifm4a0; qzdjhhhhnrfr=2; qzdjhhhhuuxs=6976; qzdjhhhhuuxx=12; qzdjhhhhudd=18731%2C1; qzdjhhhhph=22262257_1; qzdjhhhhfgp=2559712742; Hm_lvt_b70fae4308afda05c9b81ae6b2c57926=1632399361; CNZZDATA1279702724=879698640-1618380166-https%253A%252F%252Fwww.baidu.com%252F%7C1632393255; first_h=1632399361126; count_h=1; first_m=1632399361128; count_m=1; first_h_kp=1632399361130; count_h_kp=1; count_m_kp=1; first_m_kp=1632399361131; __music_index__=1; _ga=GA1.1.1315327741.1632399361; CNZZDATA1279773277=121716844-1618381574-https%253A%252F%252Fwww.baidu.com%252F%7C1632393256; 2740_2507_1.85.41.44=1; 2740_2604_1.85.41.44=1; 2740_2320_1.85.41.44=1; 2740_2470_1.85.41.44=1; 2740_2557_1.85.41.44=1; _ga_MNZLM49G3M=GS1.1.1632399361.1.1.1632399441.0; Hm_lpvt_b70fae4308afda05c9b81ae6b2c57926=1632399442; beitouviews_2740=qJxfG01YwqV8naJ%252Bd5CF2InJ5Moh%252BMLmZRtqtWf2ZEcJd5vCI0eoW3tjWzFoKR0TfjI6vEfs3rja0sKvUE155EpyFUw0FjwlqCSXixVRGdL%252Bw4Ghwn5Wl5m9XDRottCEpoxAP%252FHafCRJdUziUj7HRFY1Wp91mdmo6pc29Z3dasAW2ZpwXcKMr12hAMkk%252BJDi7adobIrVsYOHsNYcYy42wjMeNFthS3mkm%252FDd6Po4BntXzG92q%252FijQVzvf0FUi8n%252BcKSTpkQTTlt0vhQJir7ZQw1nXBnqrUltgnJER5%252Brihnyl0rog%252FRrOkS9cBRsdpFbZjjV63ljd6DkpDCSg5orwLMgrTcy2rlAQmOJFO61ula3o%252BNyNNZWgia%252B3Pqgms4BKwmCSaVdI1VBjHptpr3TwtHZY12Kzl0k3%252FlSA0A3jK3bWYKvlXTUnS%252Bs5Y7ewGGY9sipzNn4xQSKOLHmPPaRXydvJ76RYeQ%252Bp3PfHqDGjvX%252FoxdRvlaCZo6EfP8DKtn955HmE4rn9P44DMATGREPYqvciPxJ20HvgIh5XY9bJMaqq18ZGsoOZc2CIhSGcDoLC%252BX8Rg5e%252B3%252BQ1Z9CU0XCkrx43dGh0SvFdHZ7MUAITbTQMpfBPjjyF0ILRa8RuFtubX7aPJeGGbnrgQPvsnpojrnYqN7ML40wycE1YhdspMg%253D"}
response=requests.get(url=baseurl,head=head)
response1=response.content.decode(encoding='utf-8',errors='ignore')
response2=etree.HTML(response)
title1=response2.xpath('//div[@class="list-content]/div/article/a/@href')
print(title1)
无法获取网页的内容可能是因为网站有反爬虫技术,如果要获取网站信息,可以在自己的请求头中加一些参数,本人的一篇关于如何绕过登录界面爬虫的技术博客中详细说明了原因,如有需要可以自行关注博主查看相关内容,望采纳!
你是获取不到网页还是只是获取不到你想要的内容