I have a small problem in Redshift with with grouping; I have a table like following:
INPUT
VALUE CREATED UPDATED ------------------------------------ 1 '2020-09-10' '2020-09-11' 1 '2020-09-11' '2020-09-13' 2 '2020-09-15' '2020-09-16' 1 '2020-09-17' '2020-09-18'
I want to obtain this output:
VALUE CREATED UPDATED ------------------------------------ 1 '2020-09-10' '2020-09-13' 2 '2020-09-15' '2020-09-16' 1 '2020-09-17' '2020-09-18'
If I do a simple Min and Max date grouping by the value, it doesn’t work.
Advertisement
Answer
This is an example of a gap-and-islands problem. If there are no time gaps in the data, then a difference of row numbers is a simple solution:
select value, min(created), max(updated) from (select t.*, row_number() over (order by created) as seqnum, row_number() over (partition by value order by created) as seqnum_2 from t ) t group by value, (seqnum - seqnum_2) order by min(created);
Why this works is a little tricky to explain. But if you look at the results of the subquery, you will see how the difference between the row numbers identifies adjacent rows with the same value
.