第一次尝试用脚本下载图片
发现获得的text中会有莫名的""符号出现并且影响了后面xpath的识别
import requests
from lxml import etree
index_url = 'https://baike.sogou.com/v64864633.htm'
header = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
response = requests.get(index_url, headers=header)
print(response)
response.encodeing = 'utf-8'
print(response.text)
#
selector = etree.HTML(response.text)
#
image_urls = selector.xpath('//a[@class="ed_image_link"]/@title')
#
offset = 0
for image_url in image_urls:
print(image_url)
你的代码最终输出的是符合规则的
image_urls = selector.xpath('//a[@class="ed_image_link"]/@title')
所有title标签(包括子标签)
如果需要详细的话可以在for循环里面用元素定位的方式找到更细致精确的内容
可以参考
https://www.cnblogs.com/qican/p/12976088.html
https://www.cnblogs.com/qican/p/13131445.html
https://www.cnblogs.com/qican/p/13183791.html