Skip to content
Advertisement

Min and max value per group keeping order

I have a small problem in Redshift with with grouping; I have a table like following:

INPUT

VALUE       CREATED       UPDATED
------------------------------------
1        '2020-09-10'   '2020-09-11'
1        '2020-09-11'   '2020-09-13'
2        '2020-09-15'   '2020-09-16'
1        '2020-09-17'   '2020-09-18'

I want to obtain this output:

VALUE       CREATED       UPDATED
------------------------------------
1        '2020-09-10'   '2020-09-13'
2        '2020-09-15'   '2020-09-16'
1        '2020-09-17'   '2020-09-18'

If I do a simple Min and Max date grouping by the value, it doesn’t work.

Advertisement

Answer

This is an example of a gap-and-islands problem. If there are no time gaps in the data, then a difference of row numbers is a simple solution:

select value, min(created), max(updated)
from (select t.*,
             row_number() over (order by created) as seqnum,
             row_number() over (partition by value order by created) as seqnum_2
      from t
     ) t
group by value, (seqnum - seqnum_2)
order by min(created);

Why this works is a little tricky to explain. But if you look at the results of the subquery, you will see how the difference between the row numbers identifies adjacent rows with the same value.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement