python爬虫报错requests.exceptions.SSLError: HTTPSConnectionPool(host='gs.amac.org.cn', port=443): Max retries exceeded with url: /amac-infodisc/api/pof/securities (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1131)')))
搜索得知是ssl的问题,加入参数response = requests.post(url,verify=False)仍然报错, 加不加headers没有影响,尝试过方法将cryptography版本切换为36.0.2,pyopenssl更换为22.0.0仍然无效
requests代码如下
def get_data(self, page):
r = random()
url = f'https://gs.amac.org.cn/amac-infodisc/api/pof/securities?rand={r}&page={page}&size=100'
response = requests.post(url, verify = False, headers = self.headers)
return response.json()
换用scrapy进行爬取,代码如下
def start_requests(self):
r = random.random()
headers = {
"Content-Length": "2",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.79",
}
data={}
url = f'https://gs.amac.org.cn/amac-infodisc/api/pof/securities'
yield scrapy.Request(
url=url,
method='POST',
callback=self.parse,
headers=headers,
# body=json.dumps(data)
)
报错内容为 [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
如果在请求头去掉"Content-Length": "2",,则报错ERROR: Gave up retrying <POST https://gs.amac.org.cn/amac-infodisc/api/pof/securities%3E (failed 3 times): 500 Internal Server Error
如果在yield scrapy.Request内加入body参数则会出现INFO: Ignoring response <400 https://gs.amac.org.cn/amac-infodisc/api/pof/securities%3E: HTTP status code is not handled or not allowed
爬取网站为https://gs.amac.org.cn/amac-infodisc/res/pof/securities/index.html
抓包内容为
使用在线网站发送post请求仍然得不到数据
问题点: POST请求方式错误
代码修改如下,scrapy的请求方式也可以参考一下.
import requests
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Content-Type": "application/json",
"Host": "gs.amac.org.cn",
"Origin": "https://gs.amac.org.cn",
"Referer": "https://gs.amac.org.cn/amac-infodisc/res/pof/securities/index.html",
"Content-Length": "2",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
def get_data(page):
url = 'https://gs.amac.org.cn/amac-infodisc/api/pof/securities'
data = {
"rand": 0.18000891596398572,
"page": page,
"size": 20
}
response = requests.post(url, json=data, headers=headers)
return response.json()
if __name__ == '__main__':
print(get_data(1))