问题相关代码,请勿粘贴截图
import requests
from bs4 import BeautifulSoup
url = 'https://www.shicimingju.com/book/sanguoyanyi.html'
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
url = "https://www.shicimingju.com/book/sanguoyanyi.html"
response = requests.get(url=url,headers=headers)
soup = BeautifulSoup(response.content,'lxml')
print("正在请求章节内容")
gettitle = soup.select("#main>#main_left>.book-mulu a").get_text()
for title in gettitle:
print(title)
C:\Users\Administrator\AppData\Local\Programs\Python\Python310\python.exe E:/编程/python/作品/实验/pycharm项目/爬虫/爬取三国演义.py
正在请求章节内容
Traceback (most recent call last):
File "E:\编程\python\作品\实验\pycharm项目\爬虫\爬取三国演义.py", line 9, in <module>
gettitle = soup.select("#main>#main_left>.book-mulu a").get_text()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 2253, in __getattr__
raise AttributeError(
AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
bs考虑到一个文档可能有很多个相同的标签,采用下标访问
解决方法:
gettitle = soup.select("#main>#main_left>.book-mulu a")[0].get_text() # 0表示第一个匹配到的元素
采纳太快了,看你代码发现定位,啥的有些问题,不过帮你解决好了
import requests
from bs4 import BeautifulSoup
url = 'https://www.shicimingju.com/book/sanguoyanyi.html'
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
# url = "https://www.shicimingju.com/book/sanguoyanyi.html"
response = requests.get(url=url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
print("正在请求章节内容")
date = soup.select("#main_left > div > div.book-mulu > ul") # 获得列表,定位错了没有数据,原来代码#main>#main_left>.book-mulu a
# print(date)
gettitle = list(date[0].get_text().split("第")) # 以”第“作为分隔符
# print(gettitle)
# print(type(gettitle))
for title in gettitle:
print("第"+title)