疫情期间写了一个代码用request爬取百度搜索的内容,但是过了2个月突然发现不行了,老是需要进行验证。应该是识别到我是爬虫,是否近期百度对于爬虫的验证有所加强,通过伪装agent和配置代理IP好像都不行。有没有哪位大神可以支支招。
agent伪装:
lifeProxies=GetIpLive5()
#获取代理IP的代码
ua = UserAgent()
if proxies.get('http')!=None:
#print(proxies.get('http'))
proxy_support = urllib.request.ProxyHandler(proxies) #也可以设置为https,要看你的代理支不支持
opener = urllib.request.build_opener(proxy_support)
else:
opener = urllib.request.build_opener()
opener.addheaders = [
('Host', 'www.baidu.com'),
('User-Agent',ua.random),
('Accept-Encoding','gzip, deflate, br'),
('Accept', 'application/json, text/javascript, */*; q=0.01'),
('Accept-Language', 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'),
('Connection', 'keep-alive'),
#('Cookie', '__gads=ID=138080209be66bf8:T=1592037395:S=ALNI_Ma-g9wHmfxFL4GCy9veAjJrJRsNmg; Hm_lvt_dd4738b5fb302cb062ef19107df5d2e4=1592449208,1592471447,1592471736,1594001802; uid=rBADnV7m04mi8wRJK3xYAg==')
]
urllib.request.install_opener(opener)
while True:
try:
#response = opener.open(url)
time.sleep(3+random.randint(1,8))
url=urllib.parse.quote(url, safe=string.printable)
req = urllib.request.Request(url)
#response = urllib.request.urlopen(url)
response = opener.open(req)
break
except Exception as e:
print("错误信息:" + str(e))
time.sleep(3)
backurl=response.geturl()
response=gzip.GzipFile(fileobj=response)
html = response.read().decode("utf-8")
打印出来的代理格式:
当前代理验证成功:{'http': 'http://61.135.185.90:80/'}
版主有熟悉的稳定又便宜的高匿名代理推荐么?麻烦推荐一下哈。
服务器有验证,针对的是ip,你需要换ip或者用高匿名代理。