用jieba库分词实现统计文档中词语词频

怎么写，输出不了分词结果，一直报错，items方法在字典中使用，列表没有这个方法，还有保存变量result中怎么实现

后面的代码已经用 result.items() 了，说明您只可以把 result 设置成字典 dict 类型了咯。将jieba 库拆分的词频以 key、value 对的形式存入 result 字典。

如：

result = {
'我们': 1, 
'日常': 1,  
'开发': 1, 
'中': 1, 
'，':  3, 
..., 
   }

不知道你这个问题是否已经解决, 如果还没有解决的话:

以下回答来自chatgpt:

问题解答：根据问题描述，你已经使用jieba库对一份文档进行分词，但是想要统计每个词出现的频率，并且尝试了一些代码但出现了错误。因此，根据参考资料中的内容，可以使用前缀词典来实现词频统计。下面是具体的解决方案：

首先，安装jieba库并导入库：

pip install jieba
import jieba

打开文档并分词：

doc = open('example.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(doc)

构建前缀词典。根据参考资料中的代码实现pre_dict函数，并将dict.txt文件放在正确路径下，调用pre_dict函数来构建前缀词典：

def pre_dict(path):
    """
    构建前缀词典
    :param path: 词典路径
    :return: word_dcit词典和total总的词频
    """
    word_dict = dict()
    # 统计词频
    total = 0
    file_obj = open(path, 'rb')
    for lineno, line in enumerate(file_obj, 1):
        line = line.strip().decode('utf-8')
        word, freq, _ = line.split()  # 获取词汇和词频
        word_dict[word] = int(freq)
        total += int(freq)
        n = len(word)
        # 构建每个词前缀词，未登录词词频设置为0
        for ch in range(n):
            pre_word = word[:ch + 1]
            if pre_word not in word_dict:
                word_dict[pre_word] = 0  # 未登录词词频为0
    return word_dict, total

path = './dict.txt'
word_dict, total = pre_dict(path)

统计每个词出现的频率并将结果保存到result变量中。根据前缀词典，对分词结果进行遍历，统计每个词在前缀词典中出现的次数即可。结果保存到result变量中：

result = {}
for word in words:
    temp_word = ''
    for ch in word:
        temp_word += ch
        if temp_word in word_dict:
            if temp_word not in result:
                result[temp_word] = 1
            else:
                result[temp_word] += 1

将result变量中每个词出现的频率按照从大到小的顺序排序：

result = sorted(result.items(), key=lambda k: k[1], reverse=True)

最终代码如下，同时添加了prompt以便chatgpt理解问题：

# 导入jieba库
import jieba

# 打开文档并分词
doc = open('example.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(doc)

# 构建前缀词典
def pre_dict(path):
    """
    构建前缀词典
    :param path: 词典路径
    :return: word_dcit词典和total总的词频
    """
    word_dict = dict()
    # 统计词频
    total = 0
    file_obj = open(path, 'rb')
    for lineno, line in enumerate(file_obj, 1):
        line = line.strip().decode('utf-8')
        word, freq, _ = line.split()  # 获取词汇和词频
        word_dict[word] = int(freq)
        total += int(freq)
        n = len(word)
        # 构建每个词前缀词，未登录词词频设置为0
        for ch in range(n):
            pre_word = word[:ch + 1]
            if pre_word not in word_dict:
                word_dict[pre_word] = 0  # 未登录词词频为0
    return word_dict, total

path = './dict.txt'
word_dict, total = pre_dict(path)

# 统计每个词的频率并保存到result变量中
result = {}
for word in words:
    temp_word = ''
    for ch in word:
        temp_word += ch
        if temp_word in word_dict:
            if temp_word not in result:
                result[temp_word] = 1
            else:
                result[temp_word] += 1

# 将每个词的频率按从大到小排序并输出结果
result = sorted(result.items(), key=lambda k: k[1], reverse=True)
print(result)

需要注意，以上代码中仅使用了一个简单的例子文件example.txt，若使用其他文件需要相应更改文件名称。

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^