试用了2种方法都没能抓取到,不知道是不是网页的设置还是代码问题,求解答。
尝试1:
import requests
from bs4 import BeautifulSoup
res_public = requests.get('http://www.csrc.gov.cn/pub/zjhpublic/')
headers = {
'origin':'www.csrc.gov.cn',
'referer':'http://www.csrc.gov.cn/pub/newsite/',
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
}
# 解析数据
bs_public = BeautifulSoup(res_public.text,'html.parser')
# 查找最小父级标签
list_publics = bs_public.find_all('li',class_= 'mc')
print(list_publics)
# 创建一个空列表,用于存储信息
list_all = []
for publish in list_publics:
# 提取第0个父级标签中的<a>标签
tag_a = publish.find_all('a')
# 公告名,使用strip()函数去掉多余的空格
name = tag_a.text.strip()
# 获取URL
if '会议审核结果公告' in name:
URL = tag_a['href']
print(URL)
# 提取第0个父级标签中的<p>标签
res_p = requests.get(URL)
bs_p = BeautifulSoup(res_p.text,'html.parser')
list_p = bs_p.find_all('p',class_='content')
# 公告,使用strip()函数去掉多余的空格
brief = list_p.text.strip()
list_all.append([name,URL,brief])
# 打印
print(list_all)
尝试2:driver = webdriver.Ie()
driver.get('http://www.csrc.gov.cn/pub/zjhpublic/')
time.sleep(5)
driver.find_element_by_tag_name('documentContainer').find_elements_by_class_name('mc')
所要获取的内容在一个iframe里,用selenium时要先定位到该frame,frame=driver.find_element_by_css_selector('iframe#DataList')
driver.switch_to.frame(frame),然后才能获取相关tag内的信息。
by_id 不是 by_tag
按照楼上说的没有报错,有爬下来,但是如图所示,求解
driver = webdriver.Ie('IEDriverServer.exe')
driver.get('http://www.csrc.gov.cn/pub/zjhpublic/')
time.sleep(3)
frame=driver.find_element_by_css_selector('iframe#DataList')
driver.switch_to.frame(frame)
results=driver.find_element_by_id('documentContainer').find_elements_by_class_name('mc')
results
[<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="89f5bf14-d9ac-4887-8b9c-b93f90e58750")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="85469763-34fd-4272-86de-eb530c0f3857")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="dd4cc2c9-b77d-4209-9288-06e1aa8d38e2")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="6edc51c2-e2ce-4f45-89b5-c924d4c43c22")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="9b911d28-0728-4b8b-aef0-1dc957a6e34a")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="76bc916b-3fc6-4578-971a-9d8b221c7297")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="5ba0ead2-315c-48a8-bd01-672e77d0f735")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="d179c82b-6a2a-45dd-94f2-83733c1ed08b")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="98167125-57b8-42e0-a903-67fe141357ec")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="a49045ba-a489-435f-b5af-d4fd8a5adf8a")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="b2314c37-7a35-42d0-aae7-bcca05e26c25")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="45a7d658-3af4-4d48-bc32-a37160af3584")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="8611ef5e-99f1-4646-9a19-297f0ec34d4f")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="e46998e8-4700-4916-be11-2c385c353bb6")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="5894210d-02e4-42ce-b1fd-053b481a4876")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="06e979a0-f282-4a9b-936e-48bc3e9c9201")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="8ab4076d-6b67-48ac-977a-677139391911")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="9b68860c-863b-4168-b48b-8dcec42e2abe")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="f525cb41-3bd7-4a75-9783-fed37c4770c4")>,
<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="f8c8bb71-f10f-4915-938f-19da46d2cbbb")>]
楼主有解决吗?求分享code~