关于爬取百度搜索结果的问题-百度快照爬虫

疫情期间写了一个代码用request爬取百度搜索的内容,但是过了2个月突然发现不行了,老是需要进行验证。应该是识别到我是爬虫,是否近期百度对于爬虫的验证有所加强,通过伪装agent和配置代理IP好像都不行。有没有哪位大神可以支支招。

百度返回地址:https://wappass.baidu.com/static/captcha/tuxing.html?&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%25E8%25BD%25AF%25E4%25BB%25B6%2520%25E7%2599%25BE%25E7%25A7%2591%26pn%3D10&logid=7264631951170682870&signature=2f7c5e2965de16cec04b77643c3af12c&timestamp=1602146193

agent伪装:

         lifeProxies=GetIpLive5()
                #获取代理IP的代码        
        ua = UserAgent()
        if proxies.get('http')!=None:        
        #print(proxies.get('http'))
            proxy_support = urllib.request.ProxyHandler(proxies)  #也可以设置为https,要看你的代理支不支持
            opener = urllib.request.build_opener(proxy_support)

        else:
            opener = urllib.request.build_opener()

        opener.addheaders = [

                         ('Host', 'www.baidu.com'),
                         ('User-Agent',ua.random),
                        ('Accept-Encoding','gzip, deflate, br'),
                        ('Accept', 'application/json, text/javascript, */*; q=0.01'),
                        ('Accept-Language', 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'),
                        ('Connection', 'keep-alive'),
                        #('Cookie', '__gads=ID=138080209be66bf8:T=1592037395:S=ALNI_Ma-g9wHmfxFL4GCy9veAjJrJRsNmg; Hm_lvt_dd4738b5fb302cb062ef19107df5d2e4=1592449208,1592471447,1592471736,1594001802; uid=rBADnV7m04mi8wRJK3xYAg==')
                                                                            ]                  
        urllib.request.install_opener(opener)
        while True:
            try:
            #response = opener.open(url)
                time.sleep(3+random.randint(1,8))

                url=urllib.parse.quote(url, safe=string.printable)

                req = urllib.request.Request(url)
            #response = urllib.request.urlopen(url)
                response = opener.open(req)
                break
            except Exception as e:
                print("错误信息:" + str(e))
                time.sleep(3)

        backurl=response.geturl()

        response=gzip.GzipFile(fileobj=response)

        html = response.read().decode("utf-8")

打印出来的代理格式:
当前代理验证成功:{'http': 'http://61.135.185.90:80/'}

版主有熟悉的稳定又便宜的高匿名代理推荐么?麻烦推荐一下哈。

服务器有验证,针对的是ip,你需要换ip或者用高匿名代理。