如何可以实现爬取证监会公开发审委会议结果公告?

试用了2种方法都没能抓取到,不知道是不是网页的设置还是代码问题,求解答。

尝试1:

import requests
from bs4 import BeautifulSoup


res_public = requests.get('http://www.csrc.gov.cn/pub/zjhpublic/')
headers = {
    'origin':'www.csrc.gov.cn',
    'referer':'http://www.csrc.gov.cn/pub/newsite/',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
    
    }
# 解析数据
bs_public = BeautifulSoup(res_public.text,'html.parser')
# 查找最小父级标签
list_publics = bs_public.find_all('li',class_= 'mc')
print(list_publics)
# 创建一个空列表,用于存储信息
list_all = []

for publish in list_publics:
    # 提取第0个父级标签中的<a>标签
    tag_a = publish.find_all('a')
    # 公告名,使用strip()函数去掉多余的空格
    name = tag_a.text.strip()
    # 获取URL
    if '会议审核结果公告' in name: 
        URL = tag_a['href']
        print(URL)
        # 提取第0个父级标签中的<p>标签
        res_p = requests.get(URL)
        bs_p = BeautifulSoup(res_p.text,'html.parser')
        list_p = bs_p.find_all('p',class_='content')
        # 公告,使用strip()函数去掉多余的空格
        brief = list_p.text.strip()
        list_all.append([name,URL,brief])

# 打印
print(list_all)

尝试2:driver = webdriver.Ie()
driver.get('http://www.csrc.gov.cn/pub/zjhpublic/')
time.sleep(5)
driver.find_element_by_tag_name('documentContainer').find_elements_by_class_name('mc')

所要获取的内容在一个iframe里,用selenium时要先定位到该frame,frame=driver.find_element_by_css_selector('iframe#DataList')

driver.switch_to.frame(frame),然后才能获取相关tag内的信息。

by_id 不是 by_tag

按照楼上说的没有报错,有爬下来,但是如图所示,求解

driver = webdriver.Ie('IEDriverServer.exe')
driver.get('http://www.csrc.gov.cn/pub/zjhpublic/')
time.sleep(3)
frame=driver.find_element_by_css_selector('iframe#DataList')
driver.switch_to.frame(frame)
results=driver.find_element_by_id('documentContainer').find_elements_by_class_name('mc')
results
[<selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="89f5bf14-d9ac-4887-8b9c-b93f90e58750")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="85469763-34fd-4272-86de-eb530c0f3857")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="dd4cc2c9-b77d-4209-9288-06e1aa8d38e2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="6edc51c2-e2ce-4f45-89b5-c924d4c43c22")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="9b911d28-0728-4b8b-aef0-1dc957a6e34a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="76bc916b-3fc6-4578-971a-9d8b221c7297")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="5ba0ead2-315c-48a8-bd01-672e77d0f735")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="d179c82b-6a2a-45dd-94f2-83733c1ed08b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="98167125-57b8-42e0-a903-67fe141357ec")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="a49045ba-a489-435f-b5af-d4fd8a5adf8a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="b2314c37-7a35-42d0-aae7-bcca05e26c25")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="45a7d658-3af4-4d48-bc32-a37160af3584")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="8611ef5e-99f1-4646-9a19-297f0ec34d4f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="e46998e8-4700-4916-be11-2c385c353bb6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="5894210d-02e4-42ce-b1fd-053b481a4876")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="06e979a0-f282-4a9b-936e-48bc3e9c9201")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="8ab4076d-6b67-48ac-977a-677139391911")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="9b68860c-863b-4168-b48b-8dcec42e2abe")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="f525cb41-3bd7-4a75-9783-fed37c4770c4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfd4619d-2449-46d9-b5a0-f0d9cff45576", element="f8c8bb71-f10f-4915-938f-19da46d2cbbb")>]
​

楼主有解决吗?求分享code~