在获取一些网站数据时,代码中需要传递headerst和cookies,除了user-agent外,referer,accept,content-type,accept-encoding等有的也需要写进headers。使用scrapy如何添加这些参数,可以参考https://blog.csdn.net/weixin_44508906/article/details/87895868
import time
import scrapy
from bilibili_video.items import BilibiliVideoItem
class VideoSpider(scrapy.Spider):
name = 'video'
allowed_domains = ['bilibili.com']
def start_requests(self):
# temp_url = "https://www.bilibili.com/v/life/funny/?spm_id_from=333.5.b_6c6966655f66756e6e79.3#/all/click/0/1/2021-05-30,2021-06-06"
for page_num in range(1, 3836):
url = "https://www.bilibili.com/v/life/funny/?spm_id_from=333.5.b_6c6966655f66756e6e79.3#/all/click/0/" + str(page_num) + "/2021-05-30,2021-06-06"
yield scrapy.Request(url=url, callback=self.parse, dont_filter=True)
def parse(self, response):
titles = response.xpath('//div[@class="r"]/a/text()').extract()
introduces = response.xpath('//div[@class="v-desc"]/text()').extract()
play_nums = response.xpath('//div[@class="v-info"]/span[@class="v-info-i"]/span[@class]/text()').extract()
danmus = response.xpath('//div[@class="v-info"]/span[2][@class="v-info-i"]/span/text()').extract()
stores = response.xpath('//div[@class="v-info"]/span[3][@class="v-info-i"]/span/text()').extract()
up_names = response.xpath('//div[@class="up-info"]/a/text()').extract()
dates = response.xpath('//div[@class="up-info"]/span[@class="v-date"]/text()').extract()
length_times = response.xpath('//*[@id="videolist_box"]/div[2]/ul/li[1]/div[1]/div/a/div/span').extract()
websites = response.xpath('//div[@class="up-info"]/a/@href').extract()
for title,introduce,play_num,danmu,store,up_name,date,length_time,website in zip(titles,introduces,play_nums,danmus,stores,up_names,dates,length_times,websites):
item = BilibiliVideoItem()
item['title'] = title
item['introduce'] = introduce
item['play_num'] = play_num
item['danmu'] = danmu
item['store'] = store
item['up_name'] = up_name
item['date'] = date
item['length_times'] = length_time
item['website'] = website
yield item
time.sleep(2)
常见的反爬虫策略之一。
这个参数的值,表明你是从哪个网页跳转过来的。
比如说我请求获得淘宝评论的时候,他的referer是商品详情页面,表明我从这件商品详情页请求的相关评论,没有referer就不会给你这个评论
您好,我是有问必答小助手,您的问题已经有小伙伴解答了,您看下是否解决,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632