import re
with open(f'keyword.txt', 'r') as file:
keyword = [word.strip() for word in file.readlines()]
with open(f'file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
for word in keyword:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
else:
unmatched_keywords.append(word)
with open(f"new.txt", "w") as file:
file.write(article)
with open(f'unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
# 使用正则表达式查找所有含有关键词的句子
pattern = r'[^.]*' + str(keyword) + r'[^.]*'
sentences = re.findall(pattern,article)
# 打印结果
with open(f"sentences.txt", "w") as file:
#file.write(sentences)
file.write('\n'.join(sentences))
一段代码,要求keyword匹配file,同时生成三个文件,一个文件是修改以后的file,一个是未匹配的关键字unmatchd,还有一个是关键字所在的句子sentences,但是按照上述代码的运行结果,sentences文件是整篇文章;
如何修改代码,可以使得sentences只包含keyword所在的句子(keyword只匹配一次)?
那你先按句子进行分割,然后循环去匹配关键字,不要整篇文章去匹配