Is it possible to group records based on a predefined date range differences (e.g. 30 days) based on the start_date
of a row and the end_date
of the previous row for non-consecutive dates? I want to take the min(start_date)
and max(end_date)
of each group. I tried the lead
and lag
function with partition by in Oracle but couldn’t come up with a proper solution. A related but unanswered post related to my question can be found here.
E.g.
ROW_NUM PROJECT_ID START_DATE END_DATE 1 1 2016-01-14 2016-08-15 2 1 2016-08-16 2016-09-10 --- Date diff Row 1&2 = 1 Day 3 1 2016-11-15 2017-01-10 --- Date diff Row 2&3 = 66 Days 4 1 2016-01-17 2017-04-10 --- Date diff Row 3&4 = 7 Days 5 2 2018-04-28 2018-06-01 --- Other Project 6 2 2019-02-01 2019-04-05 --- Diff > 30 Days 7 2 2019-04-08 2019-07-28 --- Diff 3 Days
Expected Result:
ROW_NUM PROJECT_ID START_DATE END_DATE 1 1 2016-01-14 2016-09-10 3 1 2016-11-15 2017-04-10 5 2 2018-04-28 2018-06-01 6 2 2019-02-01 2019-07-28
Advertisement
Answer
Use lag()
and a cumulative sum to define where the groups begin. Then aggregate:
select project_id, min(start_date), max(end_date) from (select t.*, sum(case when prev_end_date > start_date - interval '30' day then 0 else 1 end) over (partition by project_id order by start_date) as grp from (select t.*, lag(end_date) over (partition by project_id order by start_date) as prev_end_date from t ) t ) t group by project_id, grp;