1 、实现对 hamlet.txt 文件的读取; 2 、实现对 hamlet.txt 文件的单词抽取; 3 、实现对 hamlet.txt 文件中出现频次最高的前 10 个单词的抽取,并排序输出; 4 、 ( 选做 ) 实现对 hamlet.txt 文件中出现频次最高的前 10 个名词的抽取,并排序输出 ( 可根据单词的出现频次进行排序,然后除去相关的代词、连词和形容词等 )
该回答引用chatgpt:
import string
from collections import Counter
# 读取文件
with open('hamlet.txt', 'r') as file:
text = file.read()
# 抽取单词
text = text.lower() # 将文本转换为小写,方便单词计数
text = text.translate(str.maketrans('', '', string.punctuation)) # 去除标点符号
words = text.split()
# 统计单词出现频次并排序输出前10个单词
word_counts = Counter(words)
top10_words = word_counts.most_common(10)
print('出现频次最高的前10个单词:')
for word, count in top10_words:
print(f'{word}: {count}')
# 统计名词出现频次并排序输出前10个名词
from nltk import pos_tag
from nltk.tokenize import word_tokenize
# 使用nltk库抽取名词
nouns = [word for (word, pos) in pos_tag(word_tokenize(text)) if pos[0] == 'N']
nouns = [noun for noun in nouns if noun not in ['i', 'you', 'he', 'she', 'it', 'we', 'they']] # 去除代词
nouns = [noun for noun in nouns if noun not in ['and', 'but', 'or', 'if', 'when', 'while', 'that']] # 去除连词
nouns = [noun for noun in nouns if pos_tag([noun])[0][1] == 'NN'] # 只保留名词
# 统计名词出现频次并排序输出前10个名词
noun_counts = Counter(nouns)
top10_nouns = noun_counts.most_common(10)
print('出现频次最高的前10个名词:')
for noun, count in top10_nouns:
print(f'{noun}: {count}')
不知道你这个问题是否已经解决, 如果还没有解决的话: