本人出于兴趣开始了python的相关学习,在跟着网课学习爬虫的过程遇到了一些问题
def get_text(url):...if
r = requests.get(url)
r.encoding = 'utf-8'
contents_All = etree.HTML(r.text)
contents_title = contents_All.xpath('//*[@id="wrapper"]/div[4]/div[1]/div[2]/h1/text()')
contents_word = contents_All.xpath('//*[@id="content"]/text()')
with open(path + contents_title[0]+'.txt', "w", encoding="utf-8") as f:
f.write(contents_word)
print(contents_title[0], "下载成功")
time.sleep(2)
if __name__ == '__main__':
for url in contents_list:
get_text(url)
结果:
Traceback (most recent call last):
File "D:\pythonProject\爬虫\爬取小说\xpath.py", line 50, in <module>
get_text(url)
File "D:\pythonProject\爬虫\爬取小说\xpath.py", line 42, in get_text
with open(path + contents_title[0]+'.txt', "w", encoding="utf-8") as f:
IndexError: list index out of range
应该是 contents_title 没匹配上
检查下你的xpath
def get_text(url):...if
r = requests.get(url)
r.encoding = 'utf-8'
contents_All = etree.HTML(r.text)
contents_title = contents_All.xpath('//*[@id="wrapper"]/div[4]/div[1]/div[2]/h1/text()')
if len(contents_title)==1:
contents_word = contents_All.xpath('//*[@id="content"]/text()')
with open(path + contents_title[0]+'.txt', "w", encoding="utf-8") as f:
f.write(contents_word)
print(contents_title[0], "下载成功")
else:
print("没找到标题,下载失败")
time.sleep(2)
contents_title[0]这个列表索引超出范围,说明contents_title = contents_All.xpath('//*[@id="wrapper"]/div[4]/div[1]/div[2]/h1/text()')没有取到数据,在这一行下面print(contents_title)确认一下,没取到的话修改.xpath参数