我是用python对一个漫画网站进行图片爬取,获得了每一张图片的网址,手动打开他会跳转到其他网页,怎么解决!
这是我的代码
import requests
import re
import string
from lxml import etree
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.69'
}
url = 'https://www.gufengmh9.com/manhua/limingzhijian/1719432.html'
path = r'C:\Users\13153\Desktop\test\ '
conten = requests.get(url=url,headers=headers,timeout=10).text
#漫画url
image_url_last = re.findall(';;var chapterImages = (.*?);var chapterPath = ',conten)
image_url_head = re.findall('chapterPath = "(.*?)";var pageTitle',conten)
li = []#收录漫画图片地址
for i in image_url_last[0].split('"'):
if i !=','and i !='['and i !=']':
w = 'https://res.xiaoqinre.com/'+image_url_head[0]+i
li.append(w)
for i in li:
print('____',i,'____')
手动打开第一张图片地址:https://res.xiaoqinre.com/images/comic/860/1719432/1639217672oiKQxe6F66bH-rCE.jpg
会自动跳转到:https://www.gufengmh9.com/sy.png
怎么解决啊
浏览器打开是够呛了,因为这个网站使用了反扒机制,只针对特定的Refer来进行返回图片信息,如果不是这个Refer,就会重定向掉你的请求。
import requests
#下载地址
Download_addres='https://res.xiaoqinre.com/images/comic/860/1719432/1639217672oiKQxe6F66bH-rCE.jpg'
#把下载地址发送给requests模块
new_header = {
"Referer": "https://www.gufengmh9.com/"
}
f=requests.get(Download_addres,headers = new_header)
#下载文件
with open("12.ipg","wb") as code:
code.write(f.content)