Skip to content
Advertisement

find the consecutive values in impala

I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.

id date value
1 8/10/2021 1
1 8/11/2021 0
1 8/12/2021 0
1 8/13/2021 0
1 8/14/2021 5
2 8/10/2021 2
2 8/11/2021 3
2 8/12/2021 0
2 8/13/2021 0
2 8/14/2021 6
3 8/10/2021 3
3 8/11/2021 4
3 8/12/2021 0
3 8/13/2021 0
3 8/14/2021 0

output

id date value Flag
1 8/10/2021 1 Y
1 8/11/2021 0 Y
1 8/12/2021 0 Y
1 8/13/2021 0 Y
1 8/14/2021 5 Y
2 8/10/2021 2 N
2 8/11/2021 3 N
2 8/12/2021 0 N
2 8/13/2021 0 N
2 8/14/2021 6 N
3 8/10/2021 3 Y
3 8/11/2021 4 Y
3 8/12/2021 0 Y
3 8/13/2021 0 Y
3 8/14/2021 0 Y

Thank you.

Advertisement

Answer

Using window count() function you can count 0’s in the frame [current row, 2 following] (ordered by date) – three consecutive rows frame calculated for each row:

count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.

If count happens to equal 3 then it means 3 consecutive 0’s found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.

To propagate ‘Y’ flag to the whole id group use max(...) over (partition by id)

Demo with your data example (tested on Hive):

with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
select 1 id,'2021-08-10' date_, 1 value union all
select 1,'2021-08-11',0 union all
select 1,'2021-08-12',0 union all
select 1,'2021-08-13',0 union all
select 1,'2021-08-14',5 union all
select 2,'2021-08-10',2 union all
select 2,'2021-08-11',3 union all
select 2,'2021-08-12',0 union all
select 2,'2021-08-13',0 union all
select 2,'2021-08-14',6 union all
select 3,'2021-08-10',3 union all
select 3,'2021-08-11',4 union all
select 3,'2021-08-12',0 union all
select 3,'2021-08-13',0 union all
select 3,'2021-08-14',0
) --End of data example, use your table instead of this CTE

select id, date_, value, 
       max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
from
(
select id, date_, value, 
 count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
from mydata
)s
  order by id, date_  --remove ordering if not necessary
                      --added it to get result in the same order

Result:

id  date_       value   flag    
1   2021-08-10  1       Y
1   2021-08-11  0       Y
1   2021-08-12  0       Y
1   2021-08-13  0       Y
1   2021-08-14  5       Y
2   2021-08-10  2       N
2   2021-08-11  3       N
2   2021-08-12  0       N
2   2021-08-13  0       N
2   2021-08-14  6       N
3   2021-08-10  3       Y
3   2021-08-11  4       Y
3   2021-08-12  0       Y
3   2021-08-13  0       Y
3   2021-08-14  0       Y
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement