python爬虫,用了一个循环,加time.sleep,对目标网址进行请求,也不报错,就是得到的数据不全吧,每次运行的结果还不一样,部分代码供参考
r_id = []
session = requests.Session()
for num in range(1000):
url2 = f'https://shop313316165.taobao.com/i/asynSearch.htm?_ksTS=1663162546671_235&callback=jsonp236&input_charset=gbk&mid=w-23998201141-0&wid=23998201141&path=/category.htm&spm=a1z10.3-c.w4002-23998201141.58.106d39daCeBZWd&input_charset=gbk&search=y&pageNo={num+1}'
response = session.get(url2, headers=headers, timeout=3)
time.sleep(0.1)
# print(response.text)
html = etree.HTML(response.text)
id = html.xpath(r"//dd/a[@class='\"item-name']/@href")
# print(type(id))
for i in id:
# i = str(i)
# print(i)
i = i.replace('\\"//item.taobao.com/item.htm?id=', '')
i = i.replace('\\"', '')
r_id.append(i)
aim_url = f'https://item.taobao.com/item.htm?spm=a1z10.3-c.w4002-23998201141.11.262339dafU7GVl&id={id}'
print(r_id)
我试过加一个timeout参数,返回结果不全也不报错
获取完整信息
这不用看都知道是反爬,
估计是被反爬了,加上代理试试