一段代码,要求关键词 keyword (一行换一行)匹配文章文件 file,生成关键字 keyword 所在的句子 sentences(关键词只匹配一次,且关键词 keywords 在句中用前后标记),目前 sentences 是按照关键词 keyword 的现货顺序排列的,如何修改代码,使得 sentences 是按照出现在文章中的先后顺序进行排列?
import re
with open('keywords.txt', 'r') as file:
keyword = [word.strip() for word in file.readlines()]
with open('file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
matched_sentences = []
matched_keywords = []
for word in keyword:
# 将正则表达式编译成一个Pattern规则对象
pattern = re.compile(fr'\b{re.escape(word)}\b')
# 匹配整个字符串,并返回第一个成功的匹配
match = re.search(pattern, article)
if match:
article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
matched_keywords.append(word)
# 根据正则表达式搜索字符串,并返回匹配的字符串列表
sentences = re.findall(sentence_pattern, article)
matched_sentence = next((s for s in sentences if word in s), None)
if matched_sentence:
article = article.replace(matched_sentence, f"{matched_sentence}", 1)
matched_sentences.append(matched_sentence)
else:
unmatched_keywords.append(word)
else:
unmatched_keywords.append(word)
with open('unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
with open("sentences.txt", "w") as file:
file.write('\n'.join(matched_sentences))
with open('matched_keywords.txt', 'w') as file:
for word in matched_keywords:
file.write(word + '\n')
with open("new.txt", "w") as file:
file.write(article)
要按照出现在文章中的先后顺序进行排列,可以在找到匹配关键词的句子后,将句子的索引和句子一起保存到列表中。然后根据句子的索引进行排序,最后将排序后的句子保存到文件中。
以下是修改后的代码:
import re
with open('keywords.txt', 'r') as file:
keywords = [word.strip() for word in file.readlines()]
with open('file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
matched_sentences = []
matched_keywords = []
for word in keywords:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
sentences = re.findall(sentence_pattern, article)
matched_sentence = next((s for s in sentences if word in s), None)
if matched_sentence:
article = article.replace(matched_sentence, f"{matched_sentence}", 1)
matched_sentences.append((article.index(matched_sentence), matched_sentence))
else:
unmatched_keywords.append(word)
matched_keywords.append(word)
else:
unmatched_keywords.append(word)
matched_sentences.sort(key=lambda x: x[0])
sorted_sentences = [sentence for _, sentence in matched_sentences]
with open('unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
with open("sentences.txt", "w") as file:
file.write('\n'.join(sorted_sentences))
with open('matched_keywords.txt', 'w') as file:
for word in matched_keywords:
file.write(word + '\n')
with open("new.txt", "w") as file:
file.write(article)
修改后的代码会根据关键词在文章中出现的先后顺序,将匹配的句子按照文章中的顺序进行排列,并保存到"sentences.txt"文件中。