r = requests.get('https://www.baidu.com/', headers={"User-Agent": UserAgent().random})
html = etree.HTML(r.content.decode('utf-8', 'ignore'))
span_tag = html.xpath("//span[@class='big']/*")
for i in span_tag:
print(i.tag)
print(i.xpath("./name()"))
假设有这么一个网页,我获取到了class为big的全部子节点,我想获取全部子节点的标签名,如何获取?
尝试过.tag和./name都获取不了
from lxml import etree
html = "world"
a = etree.HTML(html)
print(a.xpath("local-name(//a[@id='1'])"))
print(a.xpath("//a[@id='1']")[0].tag)
首先,确定你的确获取到了全部子节点,也就是span_tag不为空。元素的tag属性和name方法都可以获取子节点标签名,注意name的用法:
for i in span_tag: # 如果你的span_tag不为空,以下代码可以得出你要的结果
print('tag: {}, name:{}'.format(i.tag, i.xpath('name()')))
不建议你直接用lxml来解析HTML,推荐用BeautifulSoup,然后想获取某节点标签名的话,可以用.name,如:
print(soup.title.name)
#结果:title
from lxml import etree
html = "world"
a = etree.HTML(html)
print(a.xpath("local-name(//a[@id='1'])"))
print(a.xpath("//a[@id='1']")[0].tag)