哈喽,我想请教一下,为啥使用lxml.etree.HTML( ),解析出来的是个空[ ],,用了BeautifulSoup也是一样,就是返回一个空,网页的html结构是可以拿到的
先打印网页文本,看看有没有这个数据,没有就是反爬了!
我用你的代码试了一下,确实是空的。但是换成BeautifulSoup,并使用html.parser解析,就能够获得数据
import requests
from lxml import etree
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36'
}
url = 'https://www.plap.cn'
web_data = requests.get(url, headers=headers)
soup = BeautifulSoup(web_data.text, 'html.parser')
name = soup.select('body > div.container.content.job-content > div > div.col-md-4.col-sm-12.col-xs-12 > div:nth-child(2) > ul > li:nth-child(1) > a')
# web_html = etree.HTML(web_data.text)
# name = web_html.xpath('/html/body/div[5]/div/div[2]/div[2]/ul/li[1]/a')
print(name)