The goal is to stack dates if DT and VAR are same, and END_DATE matches the START_DATE of next highest START_DATE In the example, row 2 and 3 are combined because END_DATE of row 2 matches START_DATE of row 3.
Original
x
|DT | START_DATE | END_DATE | VAR |
|---------- |----------- |------------|---- |
|15-JUN-21 |01-JAN-20 | 01-JAN-21 | A |
|15-JUN-21 |02-JAN-20 | 10-JAN-20 | A |
|15-JUN-21 |10-JAN-20 | 01-JAN-21 | A |
|22-JUN-21 |01-JAN-21 | 31-DEC-99 | A |
Expected
|DT | START_DATE | END_DATE | VAR |
|---------- |-----------|----------- |---- |
|15-JUN-21 |01-JAN-20 | 01-JAN-21 | A |
|15-JUN-21 |02-JAN-20 | 01-JAN-21 | A |
|22-JUN-21 |01-JAN-21 | 31-DEC-99 | A |
I have tried the query below but the solution won’t work when there is additional rows.
WITH T1 as (
SELECT
t.*, row_number() over (partition by VAR,DT
order by start_date -- anything can go here
) rnum
FROM T_STACK t)
, T2 as (
SELECT
A.DT
,A.START_DATE
,CASE WHEN A.END_DATE = B.START_DATE THEN B.END_DATE
ELSE A.END_DATE END END_DATE
,A.VAR
,A.RNUM r1
,B.RNUM r2
FROM T1 A
LEFT JOIN T1 B
ON A.VAR = B.VAR
AND A.END_DATE = B.START_DATE
AND A.DT = B.DT
AND B.rnum = A.rnum+1
)
SELECT * FROM T2
WHERE r1 < 3
ORDER BY START_DATE;
Advertisement
Answer
This is a gaps-and-islands problem. I think the simplest way is to use lag()
and date comparisons to define where a group begins. Then use a cumulative sum to define the groups and aggregate:
select dt, var, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date = start_date then 0 else 1 end) over (partition by dt, var order by start_date) as grp
from (select t.*,
lag(end_date) over (partition by dt, var order by start_date) as prev_end_date
from t
) t
) t
group by dt, var, grp;