目前 sentences 是按照关键词 keyword 的现货顺序排列的,如何修改代码,使得 sentences 是按照出现在文章中的先后顺序进行排列?

一段代码,要求关键词 keyword (一行换一行)匹配文章文件 file,生成关键字 keyword 所在的句子 sentences(关键词只匹配一次,且关键词 keywords 在句中用前后标记),目前 sentences 是按照关键词 keyword 的现货顺序排列的,如何修改代码,使得 sentences 是按照出现在文章中的先后顺序进行排列?

import re

with open('keywords.txt', 'r') as file:
    keyword = [word.strip() for word in file.readlines()]

with open('file.txt', 'r') as file:
    article = file.read()

unmatched_keywords = []
matched_sentences = []
matched_keywords = []

for word in keyword:
# 将正则表达式编译成一个Pattern规则对象
    pattern = re.compile(fr'\b{re.escape(word)}\b')
# 匹配整个字符串,并返回第一个成功的匹配
    match = re.search(pattern, article)
    if match:
        article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
        sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
        matched_keywords.append(word)
# 根据正则表达式搜索字符串,并返回匹配的字符串列表
        sentences = re.findall(sentence_pattern, article)
        matched_sentence = next((s for s in sentences if word in s), None)
        if matched_sentence:
            article = article.replace(matched_sentence, f"{matched_sentence}", 1)
            matched_sentences.append(matched_sentence)
        else:
            unmatched_keywords.append(word)
    else:
        unmatched_keywords.append(word)

with open('unmatched_keywords.txt', 'w') as file:
    for word in unmatched_keywords:
        file.write(word + '\n')
with open("sentences.txt", "w") as file:
    file.write('\n'.join(matched_sentences))
with open('matched_keywords.txt', 'w') as file:
    for word in matched_keywords:
        file.write(word + '\n')
with open("new.txt", "w") as file:
    file.write(article)


要按照出现在文章中的先后顺序进行排列,可以在找到匹配关键词的句子后,将句子的索引和句子一起保存到列表中。然后根据句子的索引进行排序,最后将排序后的句子保存到文件中。

以下是修改后的代码:

import re

with open('keywords.txt', 'r') as file:
    keywords = [word.strip() for word in file.readlines()]

with open('file.txt', 'r') as file:
    article = file.read()

unmatched_keywords = []
matched_sentences = []
matched_keywords = []

for word in keywords:
    pattern = re.compile(fr'\b{re.escape(word)}\b')
    match = re.search(pattern, article)
    if match:
        article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
        sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
        sentences = re.findall(sentence_pattern, article)
        matched_sentence = next((s for s in sentences if word in s), None)
        if matched_sentence:
            article = article.replace(matched_sentence, f"{matched_sentence}", 1)
            matched_sentences.append((article.index(matched_sentence), matched_sentence))
        else:
            unmatched_keywords.append(word)
        matched_keywords.append(word)
    else:
        unmatched_keywords.append(word)

matched_sentences.sort(key=lambda x: x[0])
sorted_sentences = [sentence for _, sentence in matched_sentences]

with open('unmatched_keywords.txt', 'w') as file:
    for word in unmatched_keywords:
        file.write(word + '\n')

with open("sentences.txt", "w") as file:
    file.write('\n'.join(sorted_sentences))

with open('matched_keywords.txt', 'w') as file:
    for word in matched_keywords:
        file.write(word + '\n')

with open("new.txt", "w") as file:
    file.write(article)

修改后的代码会根据关键词在文章中出现的先后顺序,将匹配的句子按照文章中的顺序进行排列,并保存到"sentences.txt"文件中。