以下是我的代码:
import jieba
import os
import warnings
warnings.filterwarnings('ignore')
from gensim.corpora.dictionary import Dictionary
from gensim.models.ldamodel import LdaModel
from wordcloud import WordCloud
import matplotlib.pyplot as plt
textfile=input("输入文本文件名:")
num_topics=int(input("话题数:"))
f=open(textfile, "r",encoding="utf-8")
lines=f.readlines()
f.close()
stoplist=open(r'C:\Users\26552\Desktop\stopword.txt','r',encoding="utf-8").read()
stoplist = set(w.strip() for w in stoplist)
segtexts=[]
for line in lines:
doc=[]
for w in list(jieba.cut(line,cut_all=True)):
if len(w)>1 and w not in stoplist:
doc.append(w)
segtexts.append(doc)
dictionary = Dictionary(segtexts)
dictionary.filter_extremes(2,1.0,keep_n=1000)
corpus = [dictionary.doc2bow(text) for text in segtexts]
lda = LdaModel(corpus,id2word=dictionary, num_topics=num_topics)
topics=lda.print_topics(num_topics=num_topics,num_words=10)
print(topics)
font = r'C:\Users\26552\Desktop\simfang.ttf'
wc=WordCloud(collocations=False, font_path=font, width=2800, height=2800, max_words=20,margin=2)
for topicid in range(0,num_topics):
tlist=lda.get_topic_terms(topicid, topn=20)
wdict={}
for wv in tlist:
wdict[ dictionary[wv[0]]]=wv[1]
print(wdict)
wordcloud = wc.generate_from_frequencies(wdict)
wordcloud.to_file('topic_'+str(topicid)+'.png')
这是停用词文本文档:
路径没有问题,请问老哥们,是我的代码哪里出问题了吗?
如能指正,不胜感激!
问题出在这行stoplist = set(w.strip() for w in stoplist),stoplist是从文件中读取出来的字符串,不是列表,遍历取出的是单个字符,不是停用词,所以后面停用词过滤失效。这样改一下就行了:
stoplist = set(w for w in stoplist.split('\n'))#对停用词文本按行分割成列表,每个元素就是停用词
#或者直接用列表更简洁:
stoplist = [w for w in stoplist.split('\n')]
如对解答满意,请点击采纳一下。
您好,我是有问必答小助手,您的问题已经有小伙伴解答了,您看下是否解决,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632
停用词,用readlines()代替read(),以行的方式读取