将本地文件parthive.txt文件按照年份分成三个文件part2014 part2015 part2016。
parthive.txt文件内容如下:
2015-01-01 aaa 2015
2015-02-03 bbb 2015
2014-01-01 aaa 2014
2014-02-02 ccc 2014
2015-03-01 ddd 2015
2015-04-02 eee 2015
2014-03-02 fff 2014
2015-05-05 ggg 2015
2016-04-04 ggg 2016
2014-06-06 fdf 2014
2015-06-07 ggh 2015
2015-07-08 jjj 2015
2015-08-09 lll 2015
2014-09-10 qqq 2014
2015-09-02 ppp 2015
2015-07-04 poo 2015
2015-10-11 ggf 2015
2015-11-12 bnn 2015
2015-11-20 ldf 2015
2015-12-01 ohg 2015
2015-11-29 ggg 2015
用 pandas 挺方便
import pandas as pd
df1 = pd.read_csv("data/parthive.txt",sep=" ",header=None, names =['a','b','year'])
print(df1)
df2 = df1.groupby(df1['year'])
for (k1),group in df1.groupby(['year']):
print(k1)
group.to_csv("data/part{}.txt".format(k1),header=None,sep=" ")