import re
with open(f'keyword.txt', 'r') as file:
keyword = [word.strip() for word in file.readlines()]
with open(f'file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
for word in keyword:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
else:
unmatched_keywords.append(word)
with open(f"new.txt", "w") as file:
file.write(article)
with open(f'unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
# 使用正则表达式查找所有含有关键词的句子
pattern = r'[^.]*' + str(keyword) + r'[^.]*'
sentences = re.findall(pattern,article)
# 打印结果
with open(f"sentences.txt", "w") as file:
#file.write(sentences)
file.write('\n'.join(sentences))
一段代码,要求keyword匹配file,同时生成三个文件,一个文件是修改以后的file,一个是未匹配的关键字unmatchd,还有一个是关键字所在的句子
sentences,但是按照上述代码的运行结果,sentences文件是整篇文章;
如何修改代码,可以使得sentences只包含keyword所在的句子(keyword只匹配一次)?知道要分割句子,然后循环匹配,但是怎么写都不太对
该回答引用ChatGPT-3.5,仅供参考,不保证完全正确
要使sentences
只包含包含关键字的句子,并且每个关键字只匹配一次,可以按照以下方式修改代码:
import re
with open('keyword.txt', 'r') as file:
keyword = [word.strip() for word in file.readlines()]
with open('file.txt', 'r') as file:
article = file.read()
unmatched_keywords = []
matched_sentences = []
for word in keyword:
pattern = re.compile(fr'\b{re.escape(word)}\b')
match = re.search(pattern, article)
if match:
sentence_pattern = r'[^.!?]*' + re.escape(word) + r'[^.!?]*[.!?]'
sentences = re.findall(sentence_pattern, article)
matched_sentence = next((s for s in sentences if word in s), None)
if matched_sentence:
article = article.replace(matched_sentence, f"<b>{matched_sentence}<b>", 1)
matched_sentences.append(matched_sentence)
else:
unmatched_keywords.append(word)
else:
unmatched_keywords.append(word)
with open("new.txt", "w") as file:
file.write(article)
with open('unmatched_keywords.txt', 'w') as file:
for word in unmatched_keywords:
file.write(word + '\n')
with open("sentences.txt", "w") as file:
file.write('\n'.join(matched_sentences))
这个修改后的代码会使用正则表达式sentence_pattern
来匹配包含关键字的句子,并将匹配到的句子添加到matched_sentences
列表中。然后,将每个匹配到的句子替换为带有<b>
标签的句子。最后,将matched_sentences
中的句子写入到"sentence.txt"文件中。注意,为了确保每个关键字只匹配一次,我们使用了next()
函数和None
作为默认值。
没有文章和数据,只看代码不太明白你到底想要匹配啥,你截取一段文字出来,然后我看看怎么样匹配。