怎么写,输出不了分词结果,一直报错,items方法在字典中使用,列表没有这个方法,还有保存变量result中怎么实现
后面的代码已经用 result.items() 了,说明您只可以把 result 设置成字典 dict 类型了咯。将jieba 库拆分的词频以 key、value 对的形式存入 result 字典。
如:
result = {
'我们': 1,
'日常': 1,
'开发': 1,
'中': 1,
',': 3,
...,
}
问题解答: 根据问题描述,你已经使用jieba库对一份文档进行分词,但是想要统计每个词出现的频率,并且尝试了一些代码但出现了错误。因此,根据参考资料中的内容,可以使用前缀词典来实现词频统计。下面是具体的解决方案:
pip install jieba
import jieba
doc = open('example.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(doc)
def pre_dict(path):
"""
构建前缀词典
:param path: 词典路径
:return: word_dcit词典和total总的词频
"""
word_dict = dict()
# 统计词频
total = 0
file_obj = open(path, 'rb')
for lineno, line in enumerate(file_obj, 1):
line = line.strip().decode('utf-8')
word, freq, _ = line.split() # 获取词汇和词频
word_dict[word] = int(freq)
total += int(freq)
n = len(word)
# 构建每个词前缀词,未登录词词频设置为0
for ch in range(n):
pre_word = word[:ch + 1]
if pre_word not in word_dict:
word_dict[pre_word] = 0 # 未登录词词频为0
return word_dict, total
path = './dict.txt'
word_dict, total = pre_dict(path)
result = {}
for word in words:
temp_word = ''
for ch in word:
temp_word += ch
if temp_word in word_dict:
if temp_word not in result:
result[temp_word] = 1
else:
result[temp_word] += 1
result = sorted(result.items(), key=lambda k: k[1], reverse=True)
最终代码如下,同时添加了prompt以便chatgpt理解问题:
# 导入jieba库
import jieba
# 打开文档并分词
doc = open('example.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(doc)
# 构建前缀词典
def pre_dict(path):
"""
构建前缀词典
:param path: 词典路径
:return: word_dcit词典和total总的词频
"""
word_dict = dict()
# 统计词频
total = 0
file_obj = open(path, 'rb')
for lineno, line in enumerate(file_obj, 1):
line = line.strip().decode('utf-8')
word, freq, _ = line.split() # 获取词汇和词频
word_dict[word] = int(freq)
total += int(freq)
n = len(word)
# 构建每个词前缀词,未登录词词频设置为0
for ch in range(n):
pre_word = word[:ch + 1]
if pre_word not in word_dict:
word_dict[pre_word] = 0 # 未登录词词频为0
return word_dict, total
path = './dict.txt'
word_dict, total = pre_dict(path)
# 统计每个词的频率并保存到result变量中
result = {}
for word in words:
temp_word = ''
for ch in word:
temp_word += ch
if temp_word in word_dict:
if temp_word not in result:
result[temp_word] = 1
else:
result[temp_word] += 1
# 将每个词的频率按从大到小排序并输出结果
result = sorted(result.items(), key=lambda k: k[1], reverse=True)
print(result)
需要注意,以上代码中仅使用了一个简单的例子文件example.txt,若使用其他文件需要相应更改文件名称。