某大学班主任A想要统计并分析该班6位同学(xia休学)本学期的期末考试情况,数据如下:import pandas as pd-
moport DUu00pV as p
data= {nnames[xu','zhang, "lai', 'guo', xia'; 'lin' "lin] 'finance:[75, 68, 83, 92, op.0an 57, 57].
financial engineering':(71, 63, 86, 90, op.nan. 50, 50]. 'data. mining':[78, 71, 88, 95, npnan. 63, 63].←'portfolio management: [71, 63, 86. 90, mp nan. np.nan.0an], 4
'college english'.[op.nan. p.nan.功nan nan. op.nai. man np nan]}-
1、将data转换成DataFrame结构,并命名为scores。可使用pdDataFrame( ))
2、将scores中的'names'列设置为index后,赋值给变量scores. (可应用set index( ))-
3、将休学的xia同学删除。然后删除并非本学期的科目'college english列。可使用scores drop( ) dropna()并设置参数how、axis)-
4、删除重复值,将缺考学生in'的成绩填充为0分(可使用drop duplicates( )、flla( )。最后生成下图的数
5、计算每位同学的平均成绩, 并赋值给scores'mean]。(可以使用 scores mean( )-
6、基于bins-[0,60.75, 90, 100] ,group names = [failed, 'moderate, 'great, exellert,将scores[mean]分箱,确定各位同学的成绩等级。(可使用pd.cut( )-
一下回答来自于ChatGPT4
import pandas as pd
import numpy as np
data = {'names':['xu','zhang', 'lai', 'guo', 'xia', 'lin'],
'finance':[75, 68, 83, 92, np.nan, 57, 57],
'financial engineering':[71, 63, 86, 90, np.nan, 50, 50],
'data mining':[78, 71, 88, 95, np.nan, 63, 63],
'portfolio management': [71, 63, 86, 90, np.nan, np.nan, 80],
'college english':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
# 1.
scores = pd.DataFrame(data)
# 2.
scores = scores.set_index('names')
# 3.
scores = scores.drop('xia')
scores = scores.drop('college english', axis=1)
scores = scores.dropna(how='all')
# 4.
scores = scores.drop_duplicates()
scores = scores.fillna(0)
print(scores)
# 5.
scores['mean'] = scores.mean(axis=1)
# 6.
bins = [0, 60.75, 90, 100]
group_names = ['failed', 'moderate', 'great', 'exellent']
scores['grade'] = pd.cut(scores['mean'], bins=bins, labels=group_names)
print(scores)
# 数据集:姓名、性别、各科成绩
import pandas as pd
# 1. 将数据转换为DataFrame结构,命名为scores。
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank'],
'Sex': ['F', 'M', 'M', 'M', 'F', 'M'],
'Math': [92, 70, 89, 78, 95, 85],
'English': [88, 67, 81, 84, 93, 87],
'Physics': [85, 76, 83, 85, 88, 82],
'Chemistry': [78, 80, 83, 91, 87, 86],
'Biology': [90, 68, 82, 84, 78, 77],
'History': [91, 84, 72, 79, 75, 81]
}
scores = pd.DataFrame(data)
scores = scores.set_index('Name') # 2. 将学生姓名列作为索引值。
# 3. 删除休学的同学的成绩数据和非本学期的科目成绩。
scores = scores.drop(['Eva'])
scores = scores[['Math', 'English', 'Chemistry']] # 只保留数学、英语、化学三个科目成绩
# 4. 删除重复值,将缺考学生的成绩填充为0分,并生成一个成绩分布图。
scores = scores.fillna(0) # 将缺考学生的成绩填充为0分
scores = scores.drop_duplicates() # 删除重复值
import matplotlib.pyplot as plt
scores.plot(kind='bar')
plt.show()
# 5. 计算每位同学的平均成绩。
scores['Avg'] = scores.mean(axis=1)
# 6. 基于成绩区间分组,确定每位同学的成绩等级。请提供Python代码实现。
bins = [0, 59, 69, 79, 89, 100]
labels = ['E', 'D', 'C', 'B', 'A']
scores['Grade'] = pd.cut(scores['Avg'], bins=bins, labels=labels)
print(scores)