表
code amount date
1001 100 2023-01-10
1001 101 2023-01-09
1001 99 2023-01-08
1002 110 2023-01-10
1002 109 2023-01-09
1002 102 2023-01-08
.
.
.
通过hive怎么获取到连续n天是增长状态的编码,比如我查询2023-01-10这天的数据 也就是从2023-01-10这天往前数三天,看这三天的数据中amount是否连续增长,表中1001就不是,1002则符合。
如果连续天数是固定的比如三天 我勉强能写出来,但是要是非固定的或者非常多的情况下就不好实现了,各位能否给提供下思路
我现在写了一个三天的,但是如果是30天就显得代码非常臃肿
```bash
select b.code,b.`date`,b.dd
from (select a.id,
a.`date`,
a.code,
a.dd,
rank() over ( partition by code order by `date` desc ) as rank_alias
from (select id,
code,
`date`,
amount,
3 as dd,
nvl(lead(amount, 1) over (partition by code order by `date` desc ), 0) l1amount,
nvl(lead(amount, 2) over (partition by code order by `date` desc ), 0) l2amount
from kd_stock_history_info
where day <= '2023-07-24'
and day > date_add('2023-07-24', -10)) a
where a.l1amount > a.amount
and a.l2amount > a.l1amount
order by a.code) b
where b.rank_alias = 1;
```
SELECT code,min(sales_diff) min_sales_diff from (
SELECT
code,
`date`,
amount,
LAG(amount,1,amount) OVER (PARTITION by code ORDER BY `date`) AS prev_sales,
amount - LAG(amount,1,amount) OVER (PARTITION by code ORDER BY `date`) AS sales_diff
FROM
kd_stock_history_info
where
`date` BETWEEN DATE_SUB(TO_DATE('2023-01-10'), n) AND TO_DATE('2023-01-10')
) a
group by code
having min_sales_diff >= 0;
n代表你想要的连续天数,时间你自己定
我觉得最简单的做法,供参考:
1、先对amount列和date列分别取row_number:row_number()over(partition by code order by amount) rk1、row_number()over(partition by code order by date) rk2
2、比较两个排序列是否相等,若随时间增长,则必满足两个排序列相等:计算rk2-rk1,用rk2-rk1比较,找出随时间增长的数据,但此时不知道时间是不是连续
3、判断时间是否连续:对date列row_number,date-row_number列的差值若一样,则连续
完整代码为:
select
a.code
,a.iflx --这个字段可以不要
,count(a.iflx) as 连续n天
from
(
SELECT
code
,ROW_NUMBER()over(partition by code order by date) as rk_date
,row_number()over(partition by code order by amount) as rk_amount
,DATE_SUB(date,ROW_NUMBER()over(partition by code order by date)) as iflx --用来判断是否连续
from
table
)a
group by a.code
,a.iflx
having count(a.iflx) >1