Skip to content
Advertisement

find the consecutive values in impala

I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.

id date value
1 8/10/2021 1
1 8/11/2021 0
1 8/12/2021 0
1 8/13/2021 0
1 8/14/2021 5
2 8/10/2021 2
2 8/11/2021 3
2 8/12/2021 0
2 8/13/2021 0
2 8/14/2021 6
3 8/10/2021 3
3 8/11/2021 4
3 8/12/2021 0
3 8/13/2021 0
3 8/14/2021 0

output

id date value Flag
1 8/10/2021 1 Y
1 8/11/2021 0 Y
1 8/12/2021 0 Y
1 8/13/2021 0 Y
1 8/14/2021 5 Y
2 8/10/2021 2 N
2 8/11/2021 3 N
2 8/12/2021 0 N
2 8/13/2021 0 N
2 8/14/2021 6 N
3 8/10/2021 3 Y
3 8/11/2021 4 Y
3 8/12/2021 0 Y
3 8/13/2021 0 Y
3 8/14/2021 0 Y

Thank you.

Advertisement

Answer

Using window count() function you can count 0’s in the frame [current row, 2 following] (ordered by date) – three consecutive rows frame calculated for each row:

count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.

If count happens to equal 3 then it means 3 consecutive 0’s found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.

To propagate ‘Y’ flag to the whole id group use max(...) over (partition by id)

Demo with your data example (tested on Hive):

Result:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement