大佬们帮帮忙,想了半天了都没理解,正则匹配提取出这种嵌套的
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6401 td-center x-wrap-cell" data-columnid="gridcolumn-6401" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知1</div></td>
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6402 td-center x-wrap-cell" data-columnid="gridcolumn-6402" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知2</div></td>
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6403 td-center x-wrap-cell" data-columnid="gridcolumn-6403" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知3</div></td>
大佬们请问一下
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6670 td-center x-wrap-cell x-grid-cell-first" data-columnid="gridcolumn-6670" style="width: 108px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">特殊消息</div></td>
特殊消息有style="width: 108px;"区分。
import re
list1 = ['<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6401 td-center x-wrap-cell" data-columnid="gridcolumn-6401" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知1</div></td>',
'<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6402 td-center x-wrap-cell" data-columnid="gridcolumn-6402" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知2</div></td>',
'<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6403 td-center x-wrap-cell" data-columnid="gridcolumn-6403" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知3</div></td>',
'<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6670 td-center x-wrap-cell x-grid-cell-first" data-columnid="gridcolumn-6670" style="width: 108px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">特殊消息</div></td>']
for i in list1:
pat0 = r'style="(width:.*?);"'
a = re.compile(pat0).findall(i)
print(a)
pat1 = r';">(.*?)</div></td>'
b = re.compile(pat1).findall(i)
print(b)
用xpath
import re
html='''<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6401 td-center x-wrap-cell" data-columnid="gridcolumn-6401" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知1</div></td>
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6402 td-center x-wrap-cell" data-columnid="gridcolumn-6402" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知2</div></td>
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6403 td-center x-wrap-cell" data-columnid="gridcolumn-6403" style="width:75px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">消息通知3</div></td>
<td class="x-grid-cell x-grid-td x-grid-cell-gridcolumn-6670 td-center x-wrap-cell x-grid-cell-first" data-columnid="gridcolumn-6670" style="width: 108px;" tabindex="-1"><div class="x-grid-cell-inner" style="text-align:center;">特殊消息</div></td>
'''
result = re.findall('center;">(.*?)</div></td>',html)
print(result)
您好,我是有问必答小助手,您的问题已经有小伙伴解答了,您看下是否解决,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632