LDA主题分类结果不一致

请问大家，在设置了random_state之后，为什么还会出现LDA主题分类模型结果不一致的情况呢？相关代码如下：

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
n_features =1000 
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',
                                max_features=n_features,
                                stop_words='english',
                                max_df = 0.5,
                                min_df = 10)
tf = tf_vectorizer.fit_transform(data.content_cutted)
n_topics = 10
lda = LatentDirichletAllocation(n_components=n_topics, max_iter=50,
                                learning_method='batch',
                                learning_offset=50,
                               random_state=0)
lda.fit(tf)

#计算主题困惑度
import matplotlib.pyplot as plt
plexs = []
scores = []
n_max_topics = 20
for i in range(1,n_max_topics):
    print(i)
    lda = LatentDirichletAllocation(n_components=i, max_iter=50,
                                    learning_method='batch',
                                    learning_offset=50,random_state=1)
    lda.fit(tf)
    plexs.append(lda.perplexity(tf))
    scores.append(lda.score(tf))
n_t=19
x=list(range(1,n_t+1))
plt.plot(x,plexs[0:n_t])
plt.xlabel("number of topics")
plt.ylabel("perplexity")
plt.show()

LDA支持多线程并行计算提高性能，禁用并行计算，设置 n_jobs=1 以仅使用一个线程。，看看这样行不行

【以下回答由 GPT 生成】

问题的原因是在循环中设置了不同的随机种子(random_state)，导致每次循环都得到不同的LDA主题分类结果。

为了解决这个问题，我们需要将随机种子的设置放到循环外面，并保持一致。具体地，我们可以在循环外面设置一个随机种子(random_state)变量，并在循环中使用这个变量来设置LDA模型的随机种子。这样，每次循环都将使用相同的随机种子，从而得到一致的LDA主题分类结果。

下面是修改后的代码：

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import matplotlib.pyplot as plt

n_features = 1000 
tf_vectorizer = CountVectorizer(strip_accents='unicode',
                                max_features=n_features,
                                stop_words='english',
                                max_df=0.5,
                                min_df=10)

tf = tf_vectorizer.fit_transform(data.content_cutted)
n_topics = 10
random_state = 0  # 设置固定的随机种子
lda = LatentDirichletAllocation(n_components=n_topics, max_iter=50,
                                learning_method='batch',
                                learning_offset=50,
                                random_state=random_state)
lda.fit(tf)

# 计算主题困惑度
plexs = []
scores = []
n_max_topics = 20

for i in range(1, n_max_topics):
    print(i)
    lda = LatentDirichletAllocation(n_components=i, max_iter=50,
                                    learning_method='batch',
                                    learning_offset=50,
                                    random_state=random_state)  # 使用相同的随机种子
    lda.fit(tf)
    plexs.append(lda.perplexity(tf))
    scores.append(lda.score(tf))

n_t = 19
x = list(range(1, n_t + 1))
plt.plot(x, plexs[0:n_t])
plt.xlabel("number of topics")
plt.ylabel("perplexity")
plt.show()

这样，代码中的随机种子(random_state)将在循环外面被设置并保持一致，从而解决了LDA主题分类结果不一致的问题。

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^