将关键词所在的句子重新保存在一个新的txt文件中，所在的句子指的是以句号"."开始，以句号"."结束的句子，而不是整片文章，但是报错

一段代码，批量匹配关键词加粗，同时生成unmatched_keywords文件，用于储存未匹配的关键词；同时将关键词所在的句子重新保存在一个新的txt文件中，所在的句子指的是以句号"."开始，以句号"."结束的句子，而不是整片文章，但是报错


import re

with open('keyword.txt', 'r') as file:
    keyword = [word.strip() for word in file.readlines()]

with open('file.txt', 'r') as file:
    article = file.read()

unmatched_keywords = []
for word in keyword:
    pattern = re.compile(fr'\b{re.escape(word)}\b')
    match = re.search(pattern, article)
    if match:
        article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
    else:
        unmatched_keywords.append(word)


with open("new.txt", "w") as file:
    file.write(article)


with open('unmatched_keywords.txt', 'w') as file:
    for word in unmatched_keywords:
        file.write(word + '\n')

# 使用正则表达式查找所有含有关键词的句子
pattern = r'[^.]*' + str(keyword) + r'[^.]*'
sentences = re.findall(pattern,article)
 
# 打印结果
with open("sentences.txt", "w") as file:
    file.write(sentences)

Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\notepad5.txt", line 29, in <module>
    sentences = re.findall(pattern,article)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\re.py", line 241, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range l-b at position 1923

应该如何修改？

我run您的代码，只报最后一行“非法写入非字符”。

#!/sur/bin/nve python
# coding: utf-8
import re


mypath = '/sdcard/001/test/'

with open(f'{mypath}keyword.txt', 'r') as file:
    keyword = [word.strip() for word in file.readlines()]

with open(f'{mypath}file.txt', 'r') as file:
    article = file.read()

unmatched_keywords = []

for word in keyword:
    pattern = re.compile(fr'\b{re.escape(word)}\b')
    match = re.search(pattern, article)

    if match:
        article = re.sub(pattern, f"<b>{match.group()}<b>", article, count=1)
    else:
        unmatched_keywords.append(word)

with open(f"{mypath}new.txt", "w") as file:
    file.write(article)

with open(f'{mypath}unmatched_keywords.txt', 'w') as file:
    for word in unmatched_keywords:
        file.write(word + '\n')

# 使用正则表达式查找所有含有关键词的句子
pattern = r'[^.]*' + str(keyword) + r'[^.]*'
sentences = re.findall(pattern,article)
# 打印结果

with open(f"{mypath}sentences.txt", "w") as file:
    #file.write(sentences)
    file.write('\n'.join(sentences))

变量sentences是字符串列表，拼接后就好了
文件创建正常

好像生成文件内容没达成预期，我没仔细查阅。

不知道你这个问题是否已经解决, 如果还没有解决的话:

以下回答来自chatgpt:

要解决这个问题，你可以按照以下步骤进行修改：

首先，我们需要读取原始文本文件并将其内容存储在一个字符串变量中。

with open('original_text.txt', 'r') as file:
    text = file.read()

接下来，我们需要将文本中的关键词和句子提取出来。可以使用正则表达式进行匹配。

import re

# 提取关键词
keywords = ['keyword1', 'keyword2', 'keyword3'] # 替换为你的关键词列表
pattern = r'\b(?:{})\b'.format('|'.join(keywords))
matched_keywords = re.findall(pattern, text, flags=re.IGNORECASE)

# 提取句子
sentences = re.findall(r'(?<=\.)[^\.\n]+\.', text)

然后，我们将匹配成功的句子保存到一个新的txt文件中。

with open('matched_sentences.txt', 'w') as file:
    for sentence in sentences:
        file.write(sentence + '\n')

最后，我们需要生成一个包含未匹配关键词的文件。遍历所有关键词，将未匹配的关键词写入文件中。

unmatched_keywords = list(set(keywords) - set(matched_keywords))

with open('unmatched_keywords.txt', 'w') as file:
    for keyword in unmatched_keywords:
        file.write(keyword + '\n')

以上步骤将会完成你的需求，将包含关键词的句子保存到一个新的txt文件中，并生成一个未匹配关键词的文件。如果你还有其他问题，请随时提问。

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^