这种表格应如何爬取,英飞拓的第三列、第四列里存在多个文件。我按第四列的发布时间print是这个页面共有51个文件,但是按第1列的公司代码print只有30个
报错:InvalidSchema: No connection adapters were found for '2022-01-27 11:46'
直接请求数据接口获取数据接口,不需要用selenium采集,代码如下
import requests
import time
headers = {
'user-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.69',
'referer':'http://www.cninfo.com.cn/new/commonUrl?url=disclosure/list/notice',
'X-Requested-With':'XMLHttpRequest'
}
data={'column': 'szse_latest',
'pageNum': 1,
'pageSize': 30,
'sortName': '',
'sortType':'' ,
'clusterFlag': 'true'}
d = requests.post('http://www.cninfo.com.cn/new/disclosure',headers=headers,data=data).json()
for items in d['classifiedAnnouncements']:
for item in items:
print(item['secCode'])
print(item['secName'])
print(item['announcementTitle'])
announcementTime=str(item['announcementTime'])#发布时间是时间戳,格式要转下
announcementTime=int(announcementTime[0:len(announcementTime)-3])
announcementTime=time.localtime(announcementTime)
print(time.strftime("%Y-%m-%d", announcementTime))
print(item['adjunctUrl'])
print('---------------')
print();
print()
@CSDN专家-showbo
整体怎么解析呀,菜鸟一枚,求助