问题:从https://www.digitalanalytics.id.au/static/files/artists-spotify.csv读取数据。基于popularity创建一个名为‘Popularity_cat’的新变量。将0-50的popularity写为“low popularity”,将51-100的popularity为“high popularity”。将已清理和处理的数据帧保存为'artists-spotify-clean.csv'
问题原文是
目前写成这样,从19行开始是关于这个问题的代码
import pandas as pd
df = pd.read_csv('https://www.digitalanalytics.id.au/static/files/artists-spotify.csv',sep=';')
print(df.info())
print(df.duplicated().sum())
print(df[df.duplicated()])
df = df.drop_duplicates()
print(df.isnull().sum())
isolatemissing = pd.isnull(df['x'])
print(df[isolatemissing])
df.dropna()
df = df.sort_values(by=['popularity'], ascending=False)
print(df[['popularity']].head(20))
#def['x'] = df ['popularity'].str[-1:]
def Popularity_cat(x,y):
if x <=50
y = 'low popularity'
if x >=51
y = 'high popularity'
df['Popularity_cat'] = df['popularity'].apply(lambda x: Popularity_cat(x))
print(df[['popularity','Popularity_cat']])
df.to_csv('artists-spotify-clean.csv',sep=';',index=False)
不用定义函数,用np.where()或者列表推导式写更简单