import re
with open('keyword.txt', 'r') as file:
keyword = [word.strip() for word in file.readlines()]
with open('file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
matched_sentences = []
for word in keyword:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
sentences = re.findall(sentence_pattern, article)
matched_sentence = next((s for s in sentences if word in s), None)
if matched_sentence:
article = article.replace(matched_sentence, f"<b>{matched_sentence}<b>", 1)
matched_sentences.append(matched_sentence)
else:
unmatched_keywords.append(word)
else:
unmatched_keywords.append(word)
with open("new.txt", "w") as file:
file.write(article)
with open('unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
with open("sentences.txt", "w") as file:
file.write('\n'.join(matched_sentences))
一段代码,要求keyword匹配file,同时生成三个文件,一个文件是修改以后的file,一个是未匹配的关键字unmatchd,还有一个是关键字所在的句子
sentences(关键词只匹配一次),
如何修改代码,使得file和sentences里的关键词被这种形式标记?
有帮助的话采纳一下
可以通过在匹配关键词时,使用正则表达式来插入HTML标签实现。具体修改如下:
import re
# ...省略代码
for word in keyword:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
# 使用正则插入标签
article = re.sub(pattern, r'<b>\1</b>', article)
# ...省略代码
主要修改是在匹配到关键词时,使用re.sub()替换匹配的关键词,替换方式是用捕获组引用匹配的词,并在其外面加入HTML标签。
这样就可以在匹配到的关键词周围添加标签,从而实现在文件中标记关键词的效果。