爬取蜂窝网安徽全部景点http://www.mafengwo.cn/jd/12719/gonglve.html时,爬取不到 li 标签。
使用BeautifulSoup爬取为空。
soup = BeautifulSoup(html, 'html.parser')
print(soup.select('html body div#container div.row-allScenic div.wrapper div.bd ul.scenic-list '))
结果如下
[<ul class="scenic-list clearfix">
</ul>]
网页ul内部代码如下(应该是动态生成的,直接查看源代码ul里面就是没有)
<li>
<a href="/poi/9602.html" target="_blank" title="黄山风景区">
<div class="img"><img src="http://b1-q.mafengwo.net/s13/M00/6E/FE/wKgEaVyFR3SAKchQAAJXQXSOpZc87.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>黄山风景区</h3>
</a>
</li>
<li>
<a href="/poi/7730080.html" target="_blank" title="宏村">
<div class="img"><img src="http://p1-q.mafengwo.net/s15/M00/E6/DF/CoUBGV5HaamAcXx3AAGlNmbI4_U76.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>宏村</h3>
</a>
</li>
<li>
<a href="/poi/9684.html" target="_blank" title="西海大峡谷">
<div class="img"><img src="http://b1-q.mafengwo.net/s14/M00/13/97/wKgE2l1ipPeAO6aYAATuez1Jq3U09.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>西海大峡谷</h3>
</a>
</li>
<li>
<a href="/poi/6328735.html" target="_blank" title="西递">
<div class="img"><img src="http://b1-q.mafengwo.net/s15/M00/3B/4B/CoUBGV2kNdqADjy0AAPBiWhgJBo736.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>西递</h3>
</a>
</li>
<li>
<a href="/poi/9720.html" target="_blank" title="屯溪老街">
<div class="img"><img src="http://b1-q.mafengwo.net/s13/M00/B3/0D/wKgEaV2bMp6AMMdwAAQqIROm1GA735.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>屯溪老街</h3>
</a>
</li>
<li>
<a href="/poi/5426908.html" target="_blank" title="徽州古城">
<div class="img"><img src="http://n1-q.mafengwo.net/s10/M00/4F/A7/wKgBZ1jrgESAHGHQAAHt-nVAMu051.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>徽州古城</h3>
</a>
</li>
<li>
<a href="/poi/5426501.html" target="_blank" title="黄山翡翠谷景区">
<div class="img"><img src="http://b1-q.mafengwo.net/s12/M00/60/C4/wKgED1xIMMeAL4quAAqdKm2SP-Q74.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>黄山翡翠谷景区</h3>
</a>
</li>
<li>
<a href="/poi/9605.html" target="_blank" title="光明顶">
<div class="img"><img src="http://p1-q.mafengwo.net/s12/M00/58/27/wKgED1vkGQOAI7zOAAYh6jFZne054.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>光明顶</h3>
</a>
</li>
<li>
<a href="/poi/1548.html" target="_blank" title="月沼湖">
<div class="img"><img src="http://p1-q.mafengwo.net/s17/M00/92/D4/CoUBXl-Np1iEZLaDAAAAADwBCO0947.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>月沼湖</h3>
</a>
</li>
<li>
<a href="/poi/9724.html" target="_blank" title="南湖">
<div class="img"><img src="http://b1-q.mafengwo.net/s10/M00/2F/F2/wKgBZ1nty7uAPRz6AAT5d2JPkUw44.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>南湖</h3>
</a>
</li>
<li>
<a href="/poi/6328738.html" target="_blank" title="木坑竹海">
<div class="img"><img src="http://b1-q.mafengwo.net/s12/M00/89/29/wKgED1wPqW2AQO81AA5hRvN8lqU60.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>木坑竹海</h3>
</a>
</li>
<li>
<a href="/poi/5429154.html" target="_blank" title="查济古镇">
<div class="img"><img src="http://n1-q.mafengwo.net/s12/M00/73/A6/wKgED1uTK3iAJGhOAEVmsM4Yp5c20.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>查济古镇</h3>
</a>
</li>
<li>
<a href="/poi/6625188.html" target="_blank" title="徽杭古道">
<div class="img"><img src="http://n1-q.mafengwo.net/s12/M00/C1/45/wKgED1veKgeAWJimAB4yzt6mrKE05.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>徽杭古道</h3>
</a>
</li>
<li>
<a href="/poi/5426678.html" target="_blank" title="三河古镇">
<div class="img"><img src="http://p1-q.mafengwo.net/s12/M00/55/06/wKgED1xD5QKAAeOgAAyamhBQPlM35.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>三河古镇</h3>
</a>
</li>
<li>
<a href="/poi/5426350.html" target="_blank" title="呈坎">
<div class="img"><img src="http://n1-q.mafengwo.net/s10/M00/E0/CB/wKgBZ1t-zXeAEEM6AG0HFweCAxw84.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
<h3>呈坎</h3>
</a>
</li>
使用webdriver获取到文本,不知道怎么获取标签属性值(目前需要解决的问题)
text_class=browser.find_element_by_css_selector('.scenic-list.clearfix')
text=text_class.text #获取文本
print(text)
使用XPath定位获取不了信息
print(browser.find_element_by_xpath('//div[@class="row row-allScenic"]//div[@class="wrapper"]//div[@class="bd"]//ul[@class="scenic-list clearfix"]//li[1]'))
返回结果如下
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="c4104180-ab74-44de-a274-620ffff68289", element="382f1abb-1795-48db-987d-80e5985cdef5")>
可以先用webdriver获取动态更新后的html代码,再交给BeautifulSoup处理。
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
browser = webdriver.Chrome()
browser.get('http://www.mafengwo.cn/jd/12719/gonglve.html')
sleep(3)
html = browser.find_element_by_tag_name("html").get_attribute("outerHTML")
soup = BeautifulSoup(html, 'html.parser')
print(soup.select('html body div#container div.row-allScenic div.wrapper div.bd ul.scenic-list '))
你要找到目标url,获取返回的所有内容,然后进行分析。
不一定,你要自己测试分析
webdriver 获取标签属性用.get_attribute("属性名")方法
属性名可以是 outerHTML innerHTML id value 等DOM元素属性
print(browser.find_element_by_xpath('//div[@class="row row-allScenic"]//div[@class="wrapper"]//div[@class="bd"]//ul[@class="scenic-list clearfix"]//li[1]').get_attribute("outerHTML"))
您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632
可以到 network 中查找你要的内容,找到对应的文件,查看这个文件的请求头,获取url。如果内容被分块,url也会有规律。