请教python 如何从html 指定字符中提取完整链接？

请教如下 html 源代码<a href="http://tieba.baidu.cn" target="_blank">

如何实现通过检索其中一段字符 “tieba” 从而返回整个连接http://tieba.baidu.cn ？

这个目的是检测的网站不确定是否正确输入了这个网址 http://tieba.baidu.cn 想通过搜索一段字符来把检测的网站上的链接完整返回来，做检查

for i in re.findall('<a href="([\s\S]*?)" target="_blank">', html):
if 'tieba' in i:
print(i)

再有问题私信

from lxml.etree import HTML

html_str = '<a href="http://tieba.baidu.cn" target="_blank">'
result = HTML(html_str).xpath('//a[contains(@href, "tieba")]/@href')
# result = ['http://tieba.baidu.cn']

重要的是xpath的那句解析语句：'//a[contains(@href, "tieba")]/@href'

含义是：从源代码中解析出所有的链接包含有“tieba” 的a标签的链接

解释：

//a -> 所有的a标签（链接都是放在a标签中）

contains(@href, "tieba") -> href属性值中包含"tieba"，即链接包含tieba

/@href -> 被选中的a标签的href属性，即链接

如果你的字符(tieba)经常变动，可以改成变量：

from lxml.etree import HTML

uri = 'tieba'
html_str = '<a href="http://tieba.baidu.cn" target="_blank">'
result = HTML(html_str).xpath('//a[contains(@href, "{}")]/@href'.format(uri))