循环计算时间差从第二个值变成NaT

问题遇到的现象和发生背景

循环计算时间差,但从第二个ID开始结果变成了NaT

问题相关代码,请勿粘贴截图

data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])
data['time']=data['InvoiceDate'].groupby(data['ID']).rank(ascending=1, method='dense')
data=data.sort_values(by=['ID','time'],ascending=(1,1))
print(data)
abc = pd.DataFrame()
originData= pd.DataFrame()
originData= pd.DataFrame()
CID = data['ID'].unique()
for i in CID:
res=data[data['ID']==i]
originData['Time1'] = res['InvoiceDate'] - res['InvoiceDate'].fillna(0).shift(1)
originData['ID'] = i
originData['time2'] = res['time']
abc = pd.concat([abc, originData], ignore_index=True)

print('结果为:\n',abc.head(50))

运行结果及报错内容
   Time1     ID  time2

0 NaT 12346 1.0
1 4 days 12346 2.0
2 17 days 12346 3.0
3 10 days 12346 4.0
4 8 days 12346 5.0
5 39 days 12346 6.0
6 118 days 12346 7.0
7 NaT 12347 NaN
8 NaT 12347 NaN
9 NaT 12347 NaN
10 NaT 12347 NaN
11 NaT 12347 NaN
12 NaT 12347 NaN
13 NaT 12347 NaN
14 NaT 12348 NaN
15 NaT 12348 NaN
16 NaT 12348 NaN
17 NaT 12348 NaN
18 NaT 12348 NaN
19 NaT 12348 NaN
20 NaT 12348 NaN
21 NaT 12349 NaN
22 NaT 12349 NaN
23 NaT 12349 NaN
24 NaT 12349 NaN
25 NaT 12349 NaN
26 NaT 12349 NaN
27 NaT 12349 NaN
28 NaT 12350 NaN
29 NaT 12350 NaN
30 NaT 12350 NaN
31 NaT 12350 NaN
32 NaT 12350 NaN
33 NaT 12350 NaN
34 NaT 12350 NaN
35 NaT 12351 NaN
36 NaT 12351 NaN
37 NaT 12351 NaN
38 NaT 12351 NaN
39 NaT 12351 NaN
40 NaT 12351 NaN
41 NaT 12351 NaN
42 NaT 12352 NaN
43 NaT 12352 NaN
44 NaT 12352 NaN
45 NaT 12352 NaN
46 NaT 12352 NaN

首先需要对读取的数据data进行预处理,另外代码中originData= pd.DataFrame()应该放到循环中才行,否则originData会在循环中不断增加导致合并时索引出错问题,出现了很多NaT和None。这样改即可:

import pandas as pd
import numpy as np

data=pd.read_csv('sjcl.csv', index_col=[0], encoding='utf-8',low_memory=False).reset_index()
data['ID']=data['ID'].astype(int)
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])
date1=data.sort_values(by=['ID','InvoiceDate'],ascending=(1,1)).reset_index(drop=True)
#print(date1.head(10))
abc = pd.DataFrame()
CID = data['ID'].unique().tolist()
for i in CID:
    originData = pd.DataFrame()
    locData = date1[date1['ID'] == i]
    originData['Time'] =locData['InvoiceDate']-locData['InvoiceDate'].fillna(0).shift(1)
    originData['ID'] = locData['ID']
    abc = pd.concat([abc, originData], ignore_index=True)

print('结果为:\n',abc.head(50))



您好,我是有问必答小助手,您的问题已经有小伙伴帮您解答,感谢您对有问必答的支持与关注!
PS:问答VIP年卡 【限时加赠:IT技术图书免费领】,了解详情>>> https://vip.csdn.net/askvip?utm_source=1146287632