text = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-1"width="333">2021-05-11 10:00</td><td class="rich-table-cell" id="time2-1"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time3-1"width="333">2021-05-11 12:00</td></tr><tr class="rich-table"><td class="rich-table-cell" id="time1-2"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-2"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-2"width="333"></td></tr><tr class="rich-table"><td class="rich-table-cell" id="time1-3"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-3"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-3"width="333"></td>2021-05-11 13:00</tr><tr class="rich-table"><td class="rich-table-cell" id="time1-4"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-4"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-4"width="333"></td></tr>'''
1 = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-1"width="333">2021-05-11 10:00</td><td class="rich-table-cell" id="time2-1"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time3-1"width="333"</td>>这里有时间</tr>'''
2 = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-2"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-2"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-2"width="333"></td>这里是空的!!!</tr>'''
3 = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-3"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-3"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-3"width="333"></td>这里有时间</tr>'''
4 = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-4"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-4"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-4"width="333"></td>这里是空的!!!</tr>'''
实际内容可能不为4句,这里只是举例,没有分段,是连续的,例如text。
我使用正则.*?和(.*?)能把text里分4句都提取出来,但是我不需要一二句,只需要第二句这种末尾没时间的。
r'''<tr class="rich-table">(.*?)id=.*?width="333"></td></tr>'''
import re
text = '''<tr class="rich-table"><td class="rich-table-cell" id="time1-1"width="333">2021-05-11 10:00</td><td class="rich-table-cell" id="time2-1"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time3-1"width="333">2021-05-11 12:00</td></tr><tr class="rich-table"><td class="rich-table-cell" id="time1-2"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-2"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-2"width="333"></td></tr><tr class="rich-table"><td class="rich-table-cell" id="time1-3"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-3"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-3"width="333"></td>2021-05-11 13:00</tr><tr class="rich-table"><td class="rich-table-cell" id="time1-4"width="333">2021-05-11 11:00</td><td class="rich-table-cell" id="time2-4"width="333">2021-05-11 12:00</td><td class="rich-table-cell" id="time3-4"width="333"></td></tr>'''
s = re.findall(r'''<tr class="rich-table">(?:(?!<tr).)*?id=(?:(?!<tr).)*?width="333"></td></tr>''',text)
print(*s , sep="\n\n")
末尾没时间是什么意思
厉害、厉害