本人初学python,在正值表达式上不太懂如何选择,尤其是看代码上,往往很难选择正确的表达式(格式:变量名=“<XXXXXXX="(.*?)">”),
代码举例:
<td class="scbar_hot_td">
<div id="scbar_hot">
<strong class="xw1">热搜: </strong>
<a href="search.php?mod=forum&srchtxt=%E7%BA%A2%E8%8E%B2%E4%B9%8B%E7%8E%8B&formhash=39eefbd3&searchsubmit=true&source=hotsearch" target="_blank" class="xi2" sc="1">红莲之王</a>
现需要使用正值表达式来筛选出某论坛的板块名称(其中一个板块:红莲之王),但其他板块的<a href="XXXXX">值是不同的?
如果都是在a标签内的,就用 re.findall(r'<a .*>(.*)</a>,text)
s='''
<td class="scbar_hot_td">
<div id="scbar_hot">
<strong class="xw1">热搜: </strong>
<a href="search.php?mod=forum&srchtxt=%E7%BA%A2%E8%8E%B2%E4%B9%8B%E7%8E%8B&formhash=39eefbd3&searchsubmit=true&source=hotsearch" target="_blank" class="xi2" sc="1">红莲之王</a>
<a href="search.php?mod=forum&srchtxt=abcd;formhash=39eefbd3&searchsubmit=true&source=hotsearch" target="_blank" class="xi2" sc="1">蓝莲花</a>
'''
import re
res=re.findall(r'<a .*>(.*)</a>',s,re.M)
print(res)
#['红莲之王', '蓝莲花']
你要找出需要筛选的A标签与其他A标签的区别,
比如需要筛选的A标签中有 class="xi2"
import re
txt = '''
<td class="scbar_hot_td">
<div id="scbar_hot">
<strong class="xw1">热搜: </strong>
<a href="search.php?mod=forum&srchtxt=%E7%BA%A2%E8%8E%B2%E4%B9%8B%E7%8E%8B&formhash=39eefbd3&searchsubmit=true&source=hotsearch" target="_blank" class="xi2" sc="1">红莲之王</a>
<a href="search.php?mod=forum&srchtxt=%E7%BA%A2%E8%8E%B2%E4%B9%8B%E7%8E%8B&formhash=39eefbd3&searchsubmit=true&source=hotsearch" target="_blank" class="xi2" sc="1">要筛选的A标签</a>
<a href="xxxx" target="_blank" class="xxx" sc="1">其他A标签</a>
'''
arr = re.findall(r'<a[^<>]+class="xi2"[^<>]+>(.*?)</a>',txt,re.S)
print(arr)
您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632