我正在做一个有关近二十年中文流行歌曲歌词的情感分析,下面是完整代码:
import pandas as pd
import jieba
from textblob import TextBlob
import matplotlib.pyplot as plt
from collections import Counter
from wordcloud import WordCloud, STOPWORDS
# 加载中文停用词表
stopwords = []
with open('D:/桌面/停用词表.txt', 'rb')as f:
for line in f:
stopwords.append(line.strip().decode('utf-8'))
# 加载中文情感词表,两份词表由老师发的词表分类整理而成
positive_words = set()
negative_words = set()
with open('D:/桌面/积极情感词表.txt', 'rb')as f:
for line in f:
positive_words.add(line.strip().decode('utf-8'))
with open('D:/桌面/消极情感词表.txt', 'rb')as f:
for line in f:
negative_words.add(line.strip().decode('utf-8'))
# 读取Excel文件中的数据,并进行清洗和预处理
lyrics_data = pd.read_excel('D:/桌面/近十年中文流行歌曲歌词数据集(1).xlsx')
lyrics_data = lyrics_data.dropna(subset=['lyrics']) # 删除缺失的数据
# 对歌词数据进行情感分析和数据处理,并生成时间序列
lyrics_dict = {}
for _, data in lyrics_data.iterrows():
release_year = int(data['year'])
if release_year in lyrics_dict:
lyrics_dict[release_year][0].append(data['lyrics'])
else:
lyrics_dict[release_year] = [[data['lyrics']], []]
for year in lyrics_dict:
for lyrics in lyrics_dict[year][0]:
words = jieba.lcut(lyrics)
print("分词结果:", words)
words = [word for word in words if word not in stopwords] # 过滤无意义词
print("过滤后:", words)
emotion_words = []
for word in words:
if word in positive_words or word in negative_words:
emotion_words.append(word) # 只保留情感相关的词语
blob = TextBlob(" ".join(words))
sentiment_score = blob.sentiment.polarity
lyrics_dict[year][1].append(sentiment_score)
sentiment_data = []
for year in lyrics_dict:
mean_score = sum(lyrics_dict[year][1]) / len(lyrics_dict[year][1])
sentiment_data.append((year, mean_score))
# 将年份转换为datetime类型,然后按日期排序
sentiment_df = pd.DataFrame(sentiment_data, columns=['year', 'sentiment_score'])
sentiment_df['year'] = pd.to_datetime(sentiment_df['year'], format='%Y')
sentiment_df = sentiment_df.sort_values('year')
# 对时间序列数据进行可视化处理
x = [str(pair[0]) for pair in sentiment_data]
y = [pair[1] for pair in sentiment_data]
plt.plot(x, y)
# 添加图表标签
plt.title('Sentiment Analysis of Chinese Pop Music Lyrics Over Time')
plt.xlabel('Year')
plt.ylabel('Sentiment Score')
plt.show()
但是代码输出的图像结果是这样的:
我觉得这个输出结果很奇怪,因为原来的Excel文件里涉及到的每年的歌词,情感得分怎么都不可能会只有0分,而图像显示只有2004年的歌词情感得分是正值。我想请教一下各位,是不是我的代码有问题,如果有的话请指出。谢谢各位了!
blob = TextBlob(" ".join(words)) 这段代码错了,应该只用emotion_words
blob = TextBlob(" ".join(emotion_words))
如果有帮助,请点击一下采纳该答案~谢谢