我自己是想用窗口函数得到下一次登录时间之间的间隔,但是我很快发现我的思路与题目不符合,题目要求的是最大的连续月份数,用count的话,有一些中间没有登录的月份也算进去了(interval>1)的。我在网上也没有搜索到什么符合这个题目的函数,恳请帮助和指点!
df1['lead'] = df1.groupby('uid')['mon'].shift(-1)
df1['mon']=pd.to_datetime(arg=df1['mon'],format='%Y-%m')
df1['lead']=pd.to_datetime(arg=df1['lead'],format='%Y-%m')
df1['interval']=df1['lead']-df1['mon']
df1['interval'] =df1['interval'].astype('timedelta64[D]').astype(float)
def get_month(x):
return x//30
df1['interval'] = df1['interval'].apply(get_month)
上述代码运行结果:
如果是mysql的话,可以试下,当然首先要支持over这个函数啊:
select uid, count(*) as num_days
from (
select *, date_sub(days, interval rn day) as results
from(
select uid, days, row_number() over(partition by uid order by days) as rn
from (
select distinct uid, date(mon) as days from 表) t1
) t2) t3
group by uid, results;