用python提取文件

1 、实现对 hamlet.txt 文件的读取; 2 、实现对 hamlet.txt 文件的单词抽取; 3 、实现对 hamlet.txt 文件中出现频次最高的前 10 个单词的抽取,并排序输出; 4 、 ( 选做 ) 实现对 hamlet.txt 文件中出现频次最高的前 10 个名词的抽取,并排序输出 ( 可根据单词的出现频次进行排序,然后除去相关的代词、连词和形容词等 )

该回答引用chatgpt:

import string
from collections import Counter

# 读取文件
with open('hamlet.txt', 'r') as file:
    text = file.read()

# 抽取单词
text = text.lower()  # 将文本转换为小写，方便单词计数
text = text.translate(str.maketrans('', '', string.punctuation))  # 去除标点符号
words = text.split()

# 统计单词出现频次并排序输出前10个单词
word_counts = Counter(words)
top10_words = word_counts.most_common(10)
print('出现频次最高的前10个单词：')
for word, count in top10_words:
    print(f'{word}: {count}')

# 统计名词出现频次并排序输出前10个名词
from nltk import pos_tag
from nltk.tokenize import word_tokenize

# 使用nltk库抽取名词
nouns = [word for (word, pos) in pos_tag(word_tokenize(text)) if pos[0] == 'N']
nouns = [noun for noun in nouns if noun not in ['i', 'you', 'he', 'she', 'it', 'we', 'they']]  # 去除代词
nouns = [noun for noun in nouns if noun not in ['and', 'but', 'or', 'if', 'when', 'while', 'that']]  # 去除连词
nouns = [noun for noun in nouns if pos_tag([noun])[0][1] == 'NN']  # 只保留名词

# 统计名词出现频次并排序输出前10个名词
noun_counts = Counter(nouns)
top10_nouns = noun_counts.most_common(10)
print('出现频次最高的前10个名词：')
for noun, count in top10_nouns:
    print(f'{noun}: {count}')

不知道你这个问题是否已经解决, 如果还没有解决的话:

这个问题的回答你可以参考下: https://ask.csdn.net/questions/7702378
这篇博客你也可以参考下：Python 文件相关操作总结（读写 txt 文件、复制文件、获得文件列表等）
您还可以看一下赵帅老师的Python爬虫基础&商业案例实战课程中的批量生成舆情报告准备知识点：自动生成txt文件小节, 巩固相关知识点

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^