熊弟们,为啥我采集到的数据是空的,哪位好熊弟帮忙改改
测试的时候显示正常,没有报错
但是输出文件里却是空的
代码如下,大姥姥们帮改一下,百度,bilibili都不能解决问题
import requests
from lxml import etree
url = "https://www.ruiwen.com/meiwensuibi/1047794.html"
response = requests.get(url)
print(type(response))
html = etree.HTML(response.content)
print(html)
data = html.xpath("//diy[@class = 'main']/ul/li/a/text()")
#//DIY是获取页面下的所有diy
with open("articllist.txt", "w+", encoding="gb2312") as flie:
for item in data:
print(item)
flie.write(item + "\n")
确定是DIY,不是div吗
【以下回答由 GPT 生成】
问题可能出在以下几个方面:
下面是可能优化的解决方案:
import requests
from lxml import etree
url = "https://www.ruiwen.com/meiwensuibi/1047794.html"
# 发起网络请求并设置编码类型为utf-8
response = requests.get(url)
response.encoding = 'utf-8'
# 检查请求是否成功
if response.status_code == 200:
print("请求成功")
# 创建HTML解析器对象
html = etree.HTML(response.text)
# 使用正确的XPath表达式提取目标数据
data = html.xpath("//div[@class='main']//ul//li//a/text()")
# 使用utf-8编码打开文件,并写入数据
with open("articllist.txt", "w", encoding='utf-8') as file:
for item in data:
print(item)
file.write(item + "\n")
else:
print("请求失败")
解决方案的改进点:
diy
改为div
,并修改其内部的“/”为“//”,以确保可以正确匹配目标数据。【相关推荐】