例如http://12366.beijing.chinatax.gov.cn:8080/
如何利用excel表格中的关键词通过上面网址搜索
用 pandas 读取excel内的关键词,并用 requests获取指定网页中搜索结果
你题目的解答代码如下:
import pandas as pd
import requests
df = pd.read_excel('xxx.xlsx')
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36',
"Host": "12366.beijing.chinatax.gov.cn:8080",
"Origin": "http://12366.beijing.chinatax.gov.cn:8080",
"Pragma": "no-cache",
"Referer": "http://12366.beijing.chinatax.gov.cn:8080/",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"
}
li = []
for v in df['关键词']:
print(v)
data = {
"page": "1",
"pageSize": "5",
"zltype": "1",
"zlflag": "1",
"keywords": v,
"order": "",
"sortField": ""
}
url = "http://12366.beijing.chinatax.gov.cn:8080/zsk/zsksearch/search"
r = requests.post(url, data=data, headers=headers)
res = r.json()
if 'pageContent' in res and len(res['pageContent'])>0:
title = res['pageContent'][0]['TITLE']
zlnr = res['pageContent'][0]['ZLNR']
li.append(title+" "+zlnr)
else:
li.append("没有搜索结果")
print(li)
df['搜索结果'] = li
df.to_excel(r'xxx2.xlsx',index=None)
读取的excel
如有帮助,请点击我的回答下方的【采纳该答案】按钮帮忙采纳下,谢谢!
用python来解的话,思路是,先用openpyxl或者pandas读取要搜索的内容列,放入列表。循环遍历关键字,然后用selenium模拟请求发送关键字,点击获取和解析页面内容。将内容写入excel,或用docx写入word。
您好,我是有问必答小助手,您的问题已经有小伙伴帮您解答,感谢您对有问必答的支持与关注!