I’m trying to get the GAP (In days) between records in one Table, I’m using SQL Server.
So far, I was able to get the gap (in days) between dates, but there was an issue with the overlapping dates.
Here is the setup script:
CREATE TABLE #Temp ( RoomTypeId INT, BedId INT, StartDate DATE, EndDate DATE ) INSERT INTO #Temp (RoomTypeId, BedId, StartDate, EndDate) VALUES (39, 229, '2019-08-03', '2019-08-04'), (39, 229, '2019-08-05', '2019-08-08'), (40, 231, '2019-08-05', '2019-08-08'), (41, 238, '2019-08-02', '2019-08-03'), (42, 233, '2019-08-02', '2019-08-03'), (42, 233, '2019-08-04', '2019-08-05'), (42, 234, '2019-08-05', '2019-08-05'), (43, 241, '2019-08-01', '2019-08-01'), (43, 241, '2019-08-15', '2019-08-16'), (43, 241, '2019-08-29', '2019-08-30'), (45, 245, '2019-08-03', '2019-08-06'), (45, 244, '2019-08-05', '2019-08-05'), (45, 244, '2019-08-08', '2019-08-10'), (45, 244, '2019-08-17', '2019-08-19'), (46, 239, '2019-08-03', '2019-08-04'), (46, 239, '2019-08-05', '2019-08-05'), (48, 250, '2019-08-16', '2019-08-18'), (49, 274, '2019-08-06', '2019-08-11'), (49, 275, '2019-08-08', '2019-08-10');
Here is my query:
SELECT p.RoomTypeId, p.EndDateSeq1, p.StartDateSeq2, SUM(p.GAP) AS [GAP] FROM (SELECT t.RoomTypeId, t.EndDate AS [EndDateSeq1], t2.StartDate AS [StartDateSeq2], DATEDIFF(d, t.EndDate,t2.StartDate) AS [GAP] FROM #Temp t LEFT JOIN #Temp t2 ON t2.RoomTypeId = t.RoomTypeId AND (t2.StartDate > t.EndDate OR t2.StartDate = t.EndDate)) p GROUP BY p.RoomTypeId, p.EndDateSeq1, p.StartDateSeq2 ORDER BY p.RoomTypeId, p.EndDateSeq1 ASC
Result:
╔════════════╦═════════════╦═══════════════╦══════╗ ║ RoomTypeId ║ EndDateSeq1 ║ StartDateSeq2 ║ GAP ║ ╠════════════╬═════════════╬═══════════════╬══════╣ ║ 39 ║ 2019-08-04 ║ 2019-08-05 ║ 1 ║ ║ 39 ║ 2019-08-08 ║ NULL ║ NULL ║ ║ 40 ║ 2019-08-08 ║ NULL ║ NULL ║ ║ 41 ║ 2019-08-03 ║ NULL ║ NULL ║ ║ 42 ║ 2019-08-03 ║ 2019-08-04 ║ 1 ║ ║ 42 ║ 2019-08-03 ║ 2019-08-05 ║ 2 ║ ║ 42 ║ 2019-08-05 ║ 2019-08-05 ║ 0 ║ ║ 43 ║ 2019-08-01 ║ 2019-08-01 ║ 0 ║ ║ 43 ║ 2019-08-01 ║ 2019-08-15 ║ 14 ║ ║ 43 ║ 2019-08-01 ║ 2019-08-29 ║ 28 ║ ║ 43 ║ 2019-08-16 ║ 2019-08-29 ║ 13 ║ ║ 43 ║ 2019-08-30 ║ NULL ║ NULL ║ ║ 45 ║ 2019-08-05 ║ 2019-08-05 ║ 0 ║ ║ 45 ║ 2019-08-05 ║ 2019-08-08 ║ 3 ║ ║ 45 ║ 2019-08-05 ║ 2019-08-17 ║ 12 ║ ║ 45 ║ 2019-08-06 ║ 2019-08-08 ║ 2 ║ ║ 45 ║ 2019-08-06 ║ 2019-08-17 ║ 11 ║ ║ 45 ║ 2019-08-10 ║ 2019-08-17 ║ 7 ║ ║ 45 ║ 2019-08-19 ║ NULL ║ NULL ║ ║ 46 ║ 2019-08-04 ║ 2019-08-05 ║ 1 ║ ║ 46 ║ 2019-08-05 ║ 2019-08-05 ║ 0 ║ ║ 48 ║ 2019-08-18 ║ NULL ║ NULL ║ ║ 49 ║ 2019-08-10 ║ NULL ║ NULL ║ ║ 49 ║ 2019-08-11 ║ NULL ║ NULL ║ ╚════════════╩═════════════╩═══════════════╩══════╝
Expected Results:
╔════════════╦═════════════╦═══════════════╦══════╗ ║ RoomTypeId ║ EndDateSeq1 ║ StartDateSeq2 ║ GAP ║ ╠════════════╬═════════════╬═══════════════╬══════╣ ║ 39 ║ 2019-08-04 ║ 2019-08-05 ║ 1 ║ ║ 39 ║ 2019-08-08 ║ NULL ║ NULL ║ ║ 40 ║ 2019-08-08 ║ NULL ║ NULL ║ ║ 41 ║ 2019-08-03 ║ NULL ║ NULL ║ ║ 42 ║ 2019-08-03 ║ 2019-08-04 ║ 1 ║ ║ 42 ║ 2019-08-05 ║ NULL ║ 0 ║ ║ 43 ║ 2019-08-01 ║ 2019-08-15 ║ 14 ║ ║ 43 ║ 2019-08-16 ║ 2019-08-29 ║ 13 ║ ║ 43 ║ 2019-08-30 ║ NULL ║ NULL ║ ║ 45 ║ 2019-08-05 ║ 2019-08-06 ║ 1 ║ ║ 45 ║ 2019-08-06 ║ 2019-08-08 ║ 2 ║ ║ 45 ║ 2019-08-10 ║ 2019-08-17 ║ 7 ║ ║ 45 ║ 2019-08-19 ║ NULL ║ NULL ║ ║ 46 ║ 2019-08-04 ║ 2019-08-05 ║ 1 ║ ║ 48 ║ 2019-08-18 ║ NULL ║ NULL ║ ║ 49 ║ 2019-08-11 ║ NULL ║ NULL ║ ╚════════════╩═════════════╩═══════════════╩══════╝
The goal was to get the gap with respect to the overlapping dates
Advertisement
Answer
I don’t quite follow your results.
I think the problem is a combination of gaps-and-islands along with “fill-in-gaps”. That is, you need to combine the existing data that overlaps. Then you need to get the pieces in-between.
You can identify the overlaps by using a cumulative max of the end date from preceding rows. When it is less than or equal to the current start, then you have a new “island”. The result is aggregation and lead()
.
The code looks like this:
select roomTypeId, grp, max(enddate), lead(min(startdate)) over (partition by roomtypeid order by min(startdate)), datediff(day, max(enddate), lead(min(startdate)) over (partition by roomtypeid order by min(startdate)) ) from (select t.*, sum(case when prev_enddate > startdate then 0 else 1 end) over (partition by RoomTypeId order by startdate) as grp from (select t.*, max(enddate) over (partition by RoomTypeId order by startdate rows between unbounded preceding and 1 preceding ) as prev_enddate from temp t ) t ) t group by roomtypeid, grp order by roomtypeid, min(startdate);
Here is the db<>fiddle.
The results are subtly different, but they make more sense to me.