The goal is to stack dates if DT and VAR are same, and END_DATE matches the START_DATE of next highest START_DATE In the example, row 2 and 3 are combined because END_DATE of row 2 matches START_DATE of row 3.
Original
|DT | START_DATE | END_DATE | VAR | |---------- |----------- |------------|---- | |15-JUN-21 |01-JAN-20 | 01-JAN-21 | A | |15-JUN-21 |02-JAN-20 | 10-JAN-20 | A | |15-JUN-21 |10-JAN-20 | 01-JAN-21 | A | |22-JUN-21 |01-JAN-21 | 31-DEC-99 | A |
Expected
|DT | START_DATE | END_DATE | VAR | |---------- |-----------|----------- |---- | |15-JUN-21 |01-JAN-20 | 01-JAN-21 | A | |15-JUN-21 |02-JAN-20 | 01-JAN-21 | A | |22-JUN-21 |01-JAN-21 | 31-DEC-99 | A |
I have tried the query below but the solution won’t work when there is additional rows.
WITH T1 as ( SELECT t.*, row_number() over (partition by VAR,DT order by start_date -- anything can go here ) rnum FROM T_STACK t) , T2 as ( SELECT A.DT ,A.START_DATE ,CASE WHEN A.END_DATE = B.START_DATE THEN B.END_DATE ELSE A.END_DATE END END_DATE ,A.VAR ,A.RNUM r1 ,B.RNUM r2 FROM T1 A LEFT JOIN T1 B ON A.VAR = B.VAR AND A.END_DATE = B.START_DATE AND A.DT = B.DT AND B.rnum = A.rnum+1 ) SELECT * FROM T2 WHERE r1 < 3 ORDER BY START_DATE;
Advertisement
Answer
This is a gaps-and-islands problem. I think the simplest way is to use lag()
and date comparisons to define where a group begins. Then use a cumulative sum to define the groups and aggregate:
select dt, var, min(start_date), max(end_date) from (select t.*, sum(case when prev_end_date = start_date then 0 else 1 end) over (partition by dt, var order by start_date) as grp from (select t.*, lag(end_date) over (partition by dt, var order by start_date) as prev_end_date from t ) t ) t group by dt, var, grp;