Python spider在线编程网站,类似于力扣在线编程的,在线编程写spider项目得网站有吗?
没有的
不知道你这个问题是否已经解决, 如果还没有解决的话:因为这一次我们不是只爬取一页的数据,而是会分页爬取;所以特别重要的一点是递归思想:yield scrapy.Request(url=new_url, callback=self.parse)通过不断调用parse函数,爬取不同页的数据
meinv.py
import scrapy
class MeinvSpider(scrapy.Spider):
name = 'meinv'
# allowed_domains = ['www.xxx.com']
start_urls = ['https://pic.netbian.com/4kmeinv/']
# 生成一个通用的url模板(不可变)
url = 'https://pic.netbian.com/4kmeinv/index_%d.html'
page_num = 2
def parse(self, response):
li_list = response.xpath('//div[2]/div/div[3]/ul/li')
for li in li_list:
img_name = li.xpath('./a/b/text()').extract_first()
print(img_name)
print('------------------')
if self.page_num <= 137:
new_url = format(self.url % self.page_num)
self.page_num += 1
# 手动请求发送: callback回调函数是专门用作于数据解析
yield scrapy.Request(url=new_url, callback=self.parse)
settings.py
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
LOG_LEVEL = 'ERROR'