读取多个文件进行并根据文件第一列进行聚合

需求:读取文件1到文件4,输出最终文件。
图片说明

import pandas as pd
a
{'abc': 3, 'adb': 4, 'aer': 5}
b
{'abc': 2, 'adb': 3, 'sdf': 4}
c
{'abc': 1, 'qwe': 4, 'aer': 3}
d
{'adc': 4, 'aer': 5, 'add': 3}
df = pd.DataFrame([a,b,c,d]).fillna(0)
df
   abc  adb  aer  sdf  qwe  adc  add
0  3.0  4.0  5.0  0.0  0.0  0.0  0.0
1  2.0  3.0  0.0  4.0  0.0  0.0  0.0
2  1.0  0.0  3.0  0.0  4.0  0.0  0.0
3  0.0  0.0  5.0  0.0  0.0  4.0  3.0
df.T
       0    1    2    3
abc  3.0  2.0  1.0  0.0
adb  4.0  3.0  0.0  0.0
aer  5.0  0.0  3.0  5.0
sdf  0.0  4.0  0.0  0.0
qwe  0.0  0.0  4.0  0.0
adc  0.0  0.0  0.0  4.0
add  0.0  0.0  0.0  3.0

最后pd.to_csv就可以了

import pandas as pd
import re
result = []
#这里假设文件名是file1.txt~file4.txt
for i in range(4):
    with open("file"+str(i+1)+".txt") as a:
        result.append(pd.DataFrame([re.split("\s",k.replace("\n","")) for k in a.readlines()],columns = ["key","data"+str(i+1)]))
data = result[0]
for i in range(3):
    data = data.merge(result[i+1],on = "key",how="outer").fillna(0)
data     

图片说明