python报错：'gbk' codec can't decode byte 0x8b in position 1: illegal multibyte sequence

请看，我在爬取网页时，一开始想循环爬5页，运行有时会有三页成功爬取然后会突然报错'gbk' codec can't decode byte 0x8b in position 1: illegal multibyte sequence，有时是四页成功爬取再报同样的错。再到后来一页也怕取不下来了，就一直报gbk的错。

下面是我askURL（url）爬虫函数，上面的运行结果是可以打印*，后面？打印不出来了。


def askURL(url):

         head = {  
             "User-Agent": "Mozilla/5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 99.0 .4844.51 Safari / 537.36"
         }  
         request = urllib.request.Request(url, headers=head)
         html = ""
         try:
             response = urllib.request.urlopen(request)
             print("***********")
             html = response.read().decode('gbk')
             print(html)
             print("?????????")
         except urllib.error.URLError as e:
             if hasattr(e, "code"):
                 print(e.code)
             if hasattr(e, "reason"):
                 print(e.reason)
         return html

我用浏览器看了网页的header编码就是gbk呀，为什么还是报错呢？为什么一直报gbk的错，求知道的小伙伴解答！感谢！

根据题主描述应该是被反扒了，然后输出了其他编码的内容，如utf-8编码的，使用gbk解码肯定是出错了，题主改成下面的结构，在excerpt中用utf-8解码内容打印出来看下

def askURL(url):
 
         head = {  
             "User-Agent": "Mozilla/5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 99.0 .4844.51 Safari / 537.36"
         }  
         request = urllib.request.Request(url, headers=head)
         html = ""
         response = urllib.request.urlopen(request)
         data=response.read()########################
         try:
             print("***********")
             html = data.decode('gbk')########################
             print(html)
             print("?????????")
         except urllib.error.URLError as e:
             if hasattr(e, "code"):
                 print(e.code)
             if hasattr(e, "reason"):
                 print(e.reason)
         except:########################
             print('decode error')########################
             print(data.decode('utf-8'))########################
         return html

并且我想爬取一页的时候，有时候可以成功爬取，有时候又报gbk的错？？？难道是返爬？

这是一种反扒机制。如果你怎么切换编码方式都报这种不能编码的错误的画，那就是有一个模式是对的，只是反扒技术员在里面夹了几个不同编码的东西。你需要用remove挨个去除。我见过，一连去除7,8个才能解码出来。惨的时候不行，去除二十个还不行，不知道有多少。你试试。
不是报的 0x8b不能解码么你在decode前面添加 remove('\u0x8b') 再出错，继续remove。类似 remove(). remove(). remove().这样连续移除。