想用format方法来跳转网页

# -*- coding: utf-8 -*-
import requests
from lxml import etree
import json

url = 'https://i.news.qq.com/trpc.qqnews_web.kv_srv.kv_srv_http_proxy/list?sub_srv_id=world&srv_id=pc&offset=0&limit=20&strategy=1&ext={%22pool%22:[%22high%22,%22top%22],%22is_filter%22:10,%22check_type%22:true}'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537..36 Edg/91.0.864.37'}
res = requests.get(url, headers=headers)
res.encoding = 'utf-8'
data = json.loads(res.content.decode("utf-8"))
title = [i['title'] for i in data['data']['list']]
update = [i['update_time'] for i in data['data']['list']]
publish = [i['publish_time'] for i in data['data']['list']]
urls = [i['url'] for i in data['data']['list']]
comment = [i['comment_num'] for i in data['data']['list']]
media = [i['media_name'] for i in data['data']['list']]
category = [i['category_cn'] for i in data['data']['list']]
sub_category = [i['sub_category_cn'] for i in data['data']['list']]

info = {
        '标题': title,
        '更新时间': update,
        '出版时间': publish,
        '网址': urls,
        '评论数': comment,
        '媒体': media,
        '类别': category,
        '主类别': sub_category
}
print(info)
urls_info = ['https://i.news.qq.com/trpc.qqnews_web.kv_srv.kv_srv_http_proxy/list?sub_srv_id=world&srv_id=pc&offset={}&limit=20&strategy=1&ext={%22pool%22:[%22top%22],%22is_filter%22:2,%22check_type%22:true}'.format(str(i)) for i in range(0, 200, 20)]

 

发现网址url 中可以通过修改  id=pc&offset=0 ,offset= 数字可以翻页  但用  format(str(i)) for i in range(0, 200, 20) 方法报错为: 

urls_info = ['https://i.news.qq.com/trpc.qqnews_web.kv_srv.kv_srv_http_proxy/list?sub_srv_id=world&srv_id=pc&offset={}&limit=20&strategy=1&ext={%22pool%22:[%22top%22],%22is_filter%22:2,%22check_type%22:true}'.format(str(i)) for i in range(0, 200, 20)]
KeyError: '%22pool%22'

想问下怎么修改才能 翻页 来爬取数据?

用列表解析式改写一下urls_info,然后遍历get就可以了。这样构造:

urls_info = [ f'https://i.news.qq.com/trpc.qqnews_web.kv_srv.kv_srv_http_proxy/list?sub_srv_id=world&srv_id=pc&offset={i}&limit=20&strategy=1&ext='+'{%22pool%22:[%22top%22],%22is_filter%22:2,%22check_type%22:true' for i in range(0, 200, 20)]

直接把url改掉就行啊。。

然后request.get就可以啊。

你数组不能这么定义,list要 append进去。

list = [];

url ='abcd';

list.append(url);

您好,我是有问必答小助手,您的问题已经有小伙伴解答了,您看下是否解决,可以追评进行沟通哦~

如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~

ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632