从http://www.digitalanalytics.id.au/static/files/artists-spotify-clean.csv中读取数据,
从统计和可视化上描述变量popularity和followers。
从统计和可视化上分析popularity和followers之间的关系。
问题原文是
import pandas as pd
import researchpy as rp
import matplotlib.pyplot as plt
df = pd.read_csv('http://www.digitalanalytics.id.au/static/files/artists-spotify-clean.csv',sep=';')
print(df['popularity'].describe())
plt.hist(df['popularity'],bins=100)
plt.ticklabel_format(style='plain')
plt.xticks(rotation='vertical')
plt.tight_layout()
plt.savefig('histo.pdf')
plt.clf()
print(df['followers'].describe())
plt.hist(df['followers'],bins=100)
plt.ticklabel_format(style='plain')
plt.xticks(rotation='vertical')
plt.tight_layout()
plt.savefig('histo.pdf')
plt.clf()
print(rp.correlation.corr_pair(df[['popularity', 'followers']]))
plt.scatter(df['popularity'], df['followers'])
plt.xlabel('popularity')
plt.ylabel('Number of followers')
plt.ticklabel_format(style='plain')
plt.xticks(rotation='vertical')
plt.tight_layout()
plt.savefig('scatterplot.pdf')
plt.clf()
第5行分隔符用的不对,应该是逗号不是分号 改成sep = ',' 试试