本人正自学《Python数据分析基础》这本书,但是书中3.5.3章节的内容有误导致我现在学习进度卡死,希望大家能帮忙看看该如何纠错。
该章节内容为使用Python的pandas模块为每个工作簿和工作表计算总数和均值。
但是根据书本中提供的代码运行的结果总是报错提示 ValueError: Data must be 1-dimensional
需要计算的3张Excel表格:
表格内容如下:
希望最终实现的结果如下:
书本中的错误代码为:
#!/usr/bin/env python3
import pandas as pd
import glob
import os
input_path = "E:\python_work\Foundations_for_Analytics_with_Python\Excel"
output_file = 'pandas_output.xls'
all_workbooks = glob.glob(os.path.join(input_path,'*.xlsx*'))
data_frames = []
for workbook in all_workbooks:
all_worksheets = pd.read_excel(workbook, sheet_name=None, index_col=None)
workbook_total_sales = []
workbook_number_of_sales = []
worksheet_data_frames = []
worksheets_data_frame = None
workbook_data_frame = None
for worksheet_name, data in all_worksheets.items():
total_sales = pd.DataFrame([float(str(value).strip('$').replace(',','')) for value in data.loc[:, 'Sale Amount']]).sum()
number_of_sales = len(data.loc[:, 'Sale Amount'])
average_sales = pd.DataFrame(total_sales / number_of_sales)
workbook_total_sales.append(total_sales)
workbook_number_of_sales.append(number_of_sales)
data = {'workbook': os.path.basename(workbook),
'worksheet': worksheet_name,
'worksheet_total': total_sales,
'worksheet_average': average_sales}
worksheet_data_frames.append(pd.DataFrame(data, columns=['workbook', 'worksheet', 'worksheet_total', 'worksheet_average']))
worksheets_data_frame = pd.concat(worksheet_data_frames, axis=0, ignore_index=True)
workbook_total = pd.DataFrame(workbook_total_sales).sum()
workbook_total_number_of_sales = pd.DataFrame(workbook_number_of_sales).sum()
workbook_average = pd.DataFrame(workbook_total / workbook_total_number_of_sales)
workbook_stats = {'workbook': os.path.basename(workbook),
'workbook_total': workbook_total,
'workbook_average': workbook_average}
workbook_stats = pd.DataFrame(workbook_stats, columns=['workbook', 'workbook_total', 'workbook_average'])
workbook_data_frame = pd.merge(worksheets_data_frame, workbook_stats, on='workbook', how='left')
data_frames.append(workbook_data_frame)
all_data_concatenated = pd.concat(data_frames, axis=0, ignore_index=True)
writer = pd.ExcelWriter(output_file)
all_data_concatenated.to_excel(writer, sheet_name='sums_and_averages', index=False)
writer.save()
这个是pandas版本问题,你将你电脑版本降到书本的版本试试
可能是读取和写入格式不匹配,读取的表格是xlsx格式,写入文件是xls格式。试试将这句改成如下:
output_file = 'pandas_output.xlsx'