找不到改在哪里关闭driver,代理ip这样设置也报错,只有一个爬虫,driver在爬虫创建和中间件常见有什么不同,单线程和多线程中创建driver又有什么不同
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
ip_list = self.get_ip()
# request.meta['proxy'] = 'http://'+choice(ip_list)
self.driver.get(request.url)
sleep(2)
response=HtmlResponse(request.url,body=self.driver.page_source,request=request,encoding='utf8')
# print(self.driver.page_source)
# self.driver.quit()
return response
本人使用的是webapi,直接连接至快代理中的隧道代理!操作简单。只需在中间件中加入以下代码,并在快代理中将你本机外网IP加入白名单即可直接使用!
# 在middlewares.py文件中加入以下中间件即可:
#IP池
from scrapy import signals
from w3lib.http import basic_auth_header
class ProxyDownloaderMiddleware:
def process_request(self, request, spider):
proxy = "tps191.kdlapi.com:15818"
request.meta['proxy'] = "http://%(proxy)s" % {'proxy': proxy}
# 用户名密码认证
# request.headers['Proxy-Authorization'] = basic_auth_header('${username}', '${password}') # 白名单认证可注释此行
return None
注意:如果出现Bug,或者没出你也为了预防,在settings.py中进行如下配置: