- 问题:doc = corpus[534] # 随便找个摘要,本文只是单纯看一个摘要的tf-idf值** 目的:代码如下,想得到所有摘要的tf-idf值,如何打印出来

  • 问题:doc = corpus[534] # 随便找个摘要,本文只是单纯看一个摘要的tf-idf值**
    目的:代码如下,想得到所有摘要的tf-idf值,如何打印出来

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer(smooth_idf=True, use_idf=True)
# bag_of_words是上面的词频数
tfidf_transformer.fit(bag_of_words)
# 获取特征名称,上面限定的10000个
feature_names = cv.get_feature_names()
# 针对某个摘要提取,tfidf向量,是稀疏数据类型:scipy.sparse.csr.csr_matrix
**doc = corpus[534] # 随便找个摘要,本文只是单纯看一个摘要的tf-idf值**
tf_idf_vector = tfidf_transformer.transform(cv.transform([doc]))
from scipy.sparse import coo_matrix
# 数据格式转换:scipy.sparse.csr.csr_matrix ——> scipy.sparse.coo.coo_matrix
coo_matrix = tf_idf_vector.tocoo()
# coo_matrix.col表示稀疏数据不为0时对应的索引,coo_matrix.data表示稀疏数据不为0时索引下的取值
tuples = zip(coo_matrix.col, coo_matrix.data)
sorted_items = sorted(tuples, key=lambda x: (x[1], x[0]), reverse=True)

# 获取tf-idf前10个最大值
sorted_items = sorted_items[:10]
score_vals = []
feature_vals = []

# idx:索引 和 tf-idf:tf-idf值
for idx, score in sorted_items:
    score_vals.append(round(score, 3))
    feature_vals.append(feature_names[idx])
# 把tf-idf取值最大的前10个,获取其特征名与对应的tf-idf值,放入results字典中
results = {}
for idx in range(len(feature_vals)):
    results[feature_vals[idx]] = score_vals[idx]
# 结果打印出来
print('\nAbstract:')
print(doc)
print("\nkeywords:")
for k in results():
    print(k, results[k])

各位大神,最好给出序号对应的列表 举例如:
0 offshoring 0.227 outsourcing offshoring decision 0.214 decision 0.208
1 geographically 0.172 outsourcing offshoring decision 0.214 decision 0.208
2 geographically0.227 outsourcing offshoring decision 0.214 decision 0.208
3 offshoring 0.227 outsourcing offshoring decision 0.214 decision 0.208

https://b23.tv/crIkrVI
看看这个