左侧是题目要求,最下面是预期输出,可以帮忙写一下吗,右边是代码格式(忽略我学的乱七八糟的代码)
以下内容由CHATGPT及阿里嘎多学长共同生成、有用望采纳:
可以按照以下步骤使用jieba库统计文本中中文词出现次数:
pip install jieba
import jieba
with open('text.txt', 'r', encoding='utf-8') as f: text = f.read()
jieba.cut
函数,将文本分成一个个词:words = list(jieba.cut(text))
collections
库中的Counter
函数统计每个词出现的次数:word_counts = Counter(words)
print(word_counts.most_common(10))
完整代码如下:
import jieba
from collections import Counter
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
words = list(jieba.cut(text))
word_counts = Counter(words)
print(word_counts.most_common(10))
预期输出:
[('的', 10), ('是', 4), ('在', 3), ('Python', 3), ('使用', 2), ('jieba', 2), ('库', 2), ('统计', 2), ('文本', 2), ('中', 2)]
wc.generate_from_frequencies(worddict)
image_colors = ImageColorGenerator(back_coloring) # 从背景图片生成颜色值
plt.figure(figsize=(18,18))
# 形状与背景图片一致,颜色为默认颜色的词云
plt.imshow(wc)
plt.axis("off")
plt.show() # 绘制词云
wc.to_file(imgname1) # 保存图片
代码实现:
使用jieba库进行分词,并将每个中文词汇及其出现次数对应打印出来。
import jieba
text = '这是一句简单的中文句子,用于测试jieba库的分词功能,希望可以成功。'
words = jieba.cut(text)
word_counts = {}
for word in words:
if len(word) == 1: # 排除单个字符的词
continue
word_counts[word] = word_counts.get(word, 0) + 1
for word, count in word_counts.items():
print("{}: {}".format(word, count))