Group Records based on predefined date range in SQL (Oracle)

Is it possible to group records based on a predefined date range differences (e.g. 30 days) based on the start_date of a row and the end_date of the previous row for non-consecutive dates? I want to take the min(start_date) and max(end_date) of each group. I tried the lead and lag function with partition by in Oracle but couldn’t come up with a proper solution. A related but unanswered post related to my question can be found here.

E.g.

ROW_NUM        PROJECT_ID       START_DATE       END_DATE          
1              1                2016-01-14       2016-08-15
2              1                2016-08-16       2016-09-10     --- Date diff Row 1&2 = 1 Day
3              1                2016-11-15       2017-01-10     --- Date diff Row 2&3  = 66 Days
4              1                2016-01-17       2017-04-10     --- Date diff Row 3&4 = 7 Days
5              2                2018-04-28       2018-06-01     --- Other Project
6              2                2019-02-01       2019-04-05     --- Diff > 30 Days
7              2                2019-04-08       2019-07-28     --- Diff 3 Days

​x
 
ROW_NUM        PROJECT_ID       START_DATE       END_DATE          1              1                2016-01-14       2016-08-152              1                2016-08-16       2016-09-10     --- Date diff Row 1&2 = 1 Day3              1                2016-11-15       2017-01-10     --- Date diff Row 2&3  = 66 Days4              1                2016-01-17       2017-04-10     --- Date diff Row 3&4 = 7 Days5              2                2018-04-28       2018-06-01     --- Other Project6              2                2019-02-01       2019-04-05     --- Diff > 30 Days7              2                2019-04-08       2019-07-28     --- Diff 3 Days​

Expected Result:

ROW_NUM        PROJECT_ID       START_DATE       END_DATE          
1              1                2016-01-14       2016-09-10
3              1                2016-11-15       2017-04-10     
5              2                2018-04-28       2018-06-01   
6              2                2019-02-01       2019-07-28

 
ROW_NUM        PROJECT_ID       START_DATE       END_DATE          1              1                2016-01-14       2016-09-103              1                2016-11-15       2017-04-10     5              2                2018-04-28       2018-06-01   6              2                2019-02-01       2019-07-28​     ​

Answer

Use lag() and a cumulative sum to define where the groups begin. Then aggregate:

select project_id, min(start_date), max(end_date)
from (select t.*,
             sum(case when prev_end_date > start_date - interval '30' day then 0 else 1 end) over
                  (partition by project_id order by start_date) as grp
      from (select t.*,
                   lag(end_date) over (partition by project_id order by start_date) as prev_end_date
            from t
           ) t
     ) t
group by project_id, grp;

 
select project_id, min(start_date), max(end_date)from (select t.*,             sum(case when prev_end_date > start_date - interval '30' day then 0 else 1 end) over                  (partition by project_id order by start_date) as grp      from (select t.*,                   lag(end_date) over (partition by project_id order by start_date) as prev_end_date            from t           ) t     ) tgroup by project_id, grp;​

Advertisement

Answer