我想抓取一个网页内的api的内容。 当网页请求接口时,我从Chrome 的开发者工具里边复制出了对应的js fetch代码,我把代码发到NodeJS里边可以正常运行,且可以捕获api返回的内容。 但是我把 Chrome里边的 cURl bash 内容复制到postman 请求时,却出现了防火墙的提示。
如下图:
我的js代码如下:
fetch("https://api.example.com/api", {
headers: {
accept: "application/json, text/plain, */*",
"accept-language": "zh-CN,zh;q=0.9",
"cache-control": "no-cache",
"content-type": "application/json",
"firebase-auth": "true",
"firebase-token": "" ,
pragma: "no-cache",
"sec-ch-ua":
'"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-site",
},
referrer: "https://app.example.com/",
referrerPolicy: "strict-origin-when-cross-origin",
body: '{"action":"test","draftId":"","start":0,"end":17,"text":"hey what\'s up man","isBatch":false,"lookaheadIndex":0,"selection":{"bulletText":"","start":0,"end":17,"wholeText":"hey what\'s up man"},"languageCode":"en"}',
method: "POST",
mode: "cors",
credentials: "omit",
})
.then((res) => res.json())
.then((d) => console.log(d));
我把js代码的内容改写成python request的形式。但是运行时也得到跟postman一样的错误。
import requests
import json
url = "https://api.example.com/api"
payload = json.dumps({
"action": "REWRITE",
"draftId": "",
"start": 0,
"end": 8,
"text": "hey,man.",
"isBatch": False,
"lookaheadIndex": 0,
"selection": {
"bulletText": "",
"start": 0,
"end": 8,
"wholeText": "hey,man."
},
"languageCode": "en"
})
headers = {
'authority': 'https://api.example.com/api',
'accept': 'application/json, text/plain, */*',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'no-cache',
'content-type': 'application/json',
'firebase-auth': 'true',
'firebase-token': '' ,
'origin': 'https://app.example.com/',
'pragma': 'no-cache',
'referer': 'https://app.example.com/',
'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
#'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
请教大家,我应该怎么修改才能正常获得api返回的内容呢? 求指点,非常感谢。
今天看请求头变成token了,改下面的可以
import urllib
from urllib import request, parse
import json
import http.client
http.client._MAXHEADERS = 1000#fix http.client.HTTPException: got more than 100 headers error
data={"action":"REWRITE","draftId":"","start":0,"end":7,"text":"hey man","isBatch":False,"lookaheadIndex":0,"selection":{"bulletText":"","start":0,"end":7,"wholeText":"hey man"},"languageCode":"en"}
data=json.dumps(data)
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
'Content-Type':'application/json',
#'firebase-auth':'true',
'token':'token值,注意看浏览器实际发送的请求头是token还是firebase-token,这个请求头会变化,以浏览器的为准'
}
url = 'https://api.wordtune.com/rewrite'
data = bytes(data, encoding='utf8')
try:
req = request.Request(url=url, data=data, headers=headers, method='POST')
response = request.urlopen(req)
#print(response.status,response.reason)
print(response.read().decode('utf-8'))
except urllib.error.HTTPError as e:
# 用异常捕获,http状态码非200时,可解析出响应体
print(e.read().decode("UTF-8"))
以下备用,题主自己切换测试看
import urllib
from urllib import request, parse
import json
import http.client
http.client._MAXHEADERS = 1000#fix http.client.HTTPException: got more than 100 headers error
data={"action":"REWRITE","draftId":"","start":0,"end":7,"text":"hey man","isBatch":False,"lookaheadIndex":0,"selection":{"bulletText":"","start":0,"end":7,"wholeText":"hey man"},"languageCode":"en"}
data=json.dumps(data)
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
'Content-Type':'application/json',
'firebase-auth':'true',
'firebase-token':'aaa'
}
url = 'https://api.wordtune.com/rewrite'
data = bytes(data, encoding='utf8')
try:
req = request.Request(url=url, data=data, headers=headers, method='POST')
response = request.urlopen(req)
print(response.status,response.reason)
print(response.read().decode('utf-8'))
except urllib.error.HTTPError as e:
# 用异常捕获,http状态码非200时,可解析出响应体
print(e.read().decode("UTF-8"))
我直接把js代码保存成本地文件,在nodejs 环境里边可以正常运行的。 另外,我在目标网页的 Devtool ——Console 里边也能正常运行。
说明Js的请求头是全的,没有缺少。 但是Python代码里边也带了同样数量的请求头,就会请求失败。 我觉得是python代码里边的请求体格式不对。
第一步:在浏览器抓包中右键复制该请求的 cURL Request
第二步:在这里粘贴:https://spidertools.cn/#/curl2Request
第三步:复制生成的python代码