一个网页的内容在f12可以看到,但是爬下来之后,这个网页中content对应的内容被替换成了None,其他的内容都没有问题,就只有这个content的内容被换了。
这是用到的代码
import requests
import aiohttp
import asyncio
import json
from fake_useragent import UserAgent
async def downloadconteent(title,cid,b_id):
data = {
"book_id" : b_id, "cid" : f"{b_id}| {cid}", "need_bookinfo" : 1}
data=json.dumps(data)
url=f"https://dushu.baidu.com/api/pc/getChapterContent?data={data}%22
async with aiohttp.ClientSession()as session:
async with session.get(url) as resp:
dic=await resp.json()
print(dic)
async def getcatalog(url):
headers = {"UserAgent": UserAgent().chrome}
resp = requests.get(url, headers=headers)
resp.encoding="utf-8"
tasks=[]
for item in resp.json()["data"]["novel"]["items"]:
title=item["title"]
cid=item["cid"]
tasks.append(downloadconteent(title,cid,b_id))
await asyncio.wait(tasks)
if name=="main":
b_id="4306063500"
url='https://dushu.baidu.com/api/pc/getCatalog?data={%22book_id%22:%22%27+b_id+%27%22}%27
asyncio.run(getcatalog(url))
这是f12里的内容截图
补全相关的请求参数再请求应该就没问题了,比如cookies、refer等参数