您好,我是为了写论文所以学习了一下python的文本分析,要实现的目的就是获得excel中某一列文本数据指定关键词出现的次数,(按每一个单元格出现关键词次数来统计)但我的代码运行出来,只有关键词是一个的情况下才正确(就是keyword=‘推进’,这样一个关键词,我keyword={‘推进’,‘提升’}这样就不对了),我的关键词多几个它虽然能跑,但结果不对,请问各位我的代码哪出了问题呢?为了写论文刚开始学python 啥也不会 还请大家可以帮忙解答一下,万分感谢!!
数据示例:
我写的代码:
import pandas as pd
import jieba
data = pd.read_excel(r'C:\Users\86158\PycharmProjects\pythonProject3\文本分析\业绩说明会问答文本分析.xlsx')
data.head()
data1 = data.iloc[1:, :]
data1.head()
def cutword(mda):
keywords ={'持续','发展','提升','促进'}
wordcut = jieba.cut(mda)
wordict = {}
for keyword in keywords:
for word in wordcut:
if word in wordict.keys():
wordict[word] += 1
else:
wordict[word] = 1
if keyword in wordict.keys():
wordcount =wordict[keyword]
else:
wordcount =0
return wordcount
data1['风险词频'] = data1['Acntet'].apply(cutword)
print(data1)
按照以上你的代码看,也可能出现重复计数的问题,另外代码中没有考虑多关键词同时出现的情况。可以参考一下来进行修改
```python
import pandas as pd
import jieba
def count_keyword(text, keywords):
wordcut = jieba.cut(text)
wordict = {}
for word in wordcut:
if word in wordict.keys():
wordict[word] += 1
else:
wordict[word] = 1
count = 0
for keyword in keywords:
if keyword in wordict.keys():
count += wordict[keyword]
return count
data = pd.read_excel(r'C:\Users\86158\PycharmProjects\pythonProject3\文本分析\业绩说明会问答文本分析.xlsx')
data = data.iloc[1:, :]
keywords = ['持续', '发展', '提升', '促进']
data['风险词频'] = data['Acntet'].apply(lambda x: count_keyword(x, keywords))
print(data)
```
希望能帮到你,加油~~~