SQL: Capturing first row in WHILE loop

Question

I'm thinking my issue is based on how I've written the loop, but how it is written is how I've come to understand loops. So I'm really wondering if there is a way to consolidate this? As it sits, I have 2 sections: the first which captures the very first row, and it exists solely because my second section will

Accepted Answer

One option is moving the SET @I = @I + 1 line to after the rest of the loop body (and then also run for one iteration longer). In this way, the first adjustment for the dates is still 0. But don’t do this.I’m thinking my issue is based on how I’ve written the loopIt’s not in how you’ve written the loop, but that a loop was written at all. Nearly every case where you want to use a loop in SQL there is a set-based alternative that is vastly more efficient… usually multiple orders of magnitude. This is no exception. Six-ten seconds is an eternity for a process like this; no reason it shouldn’t finish almost instantly.The code for that will look something like this:WITH -- generate numbers L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1 L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2 L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4 L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8 Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3), -- project numbers as start and end dates Dates As (SELECT TOP 17 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate FROM Nums ORDER BY n)SELECT d.StartDate, COUNT(DISTINCT value) as [count]FROM [TABLE] t-- use the dates to filter the tableINNER JOIN Dates d on t.[Date] > = d.StartDate and t.[Date] < d.EndDateGROUP BY d.StartDateOr I can show this as actually runnable code:WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3), Dates As (SELECT top 17 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate FROM Nums ORDER BY n)SELECT StartDate, EndDateFROM Dates dWe can see this gives 17 results with same start and end values as a modified version of the original code in the question.Update:Now that we have the full original code, I can adapt my answer to use it:WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1 L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2 L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4 L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8 Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3), Dates As (SELECT TOP 17 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate FROM Nums ORDER BY n)SELECT d.StartDate, COUNT(DISTINCT o.ClinicLocationId) As [count]FROM [order].package pINNER JOIN [order].[order] o ON o.packageid = p.packageidINNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId AND cli.FacilityType IN ('CLINIC', 'HOSPITAL')-- PLEASE tell me ShipDTM is a datetime value and not a varcharINNER JOIN Dates d ON d.StartDate <= p.ShipDTM and p.ShipDTM < d.EndDateWHERE p.IsShipped = 1 and o.IsShipped = 1 o.IsCanceled IS NULLGROUP BY d.StartDateAlternatively, if this still somehow gives you the wrong results (I think the GROUP BY will have fixed it), you can use an APPLY instead, like so (the JOIN/GROUP BY should still be faster):WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), -- 2^1 L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), -- 2^2 L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), -- 2^4 L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), -- 2^8 Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS n FROM L3), Dates As (SELECT TOP 17 DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n-12, 0) as StartDate, DATEADD(month, DATEDIFF(month, 0, current_timestamp) -n, 0) as EndDate FROM Nums ORDER BY n)SELECT d.StartDate, counts.[count]FROM Dates dCROSS APPLY ( SELECT count = COUNT(DISTINCT o.ClinicLocationId) FROM [order].package p INNER JOIN [order].[order] o ON o.packageid = p.packageid INNER JOIN Profile.ClinicLocationInfo cli ON cli.LocationId = o.ClinicLocationId AND cli.FacilityType IN('CLINIC', 'HOSPITAL') WHERE p.ShipDTM >= d.StartDate AND p.ShipDTM < d.EndDate AND p.isshipped = 1 AND o.IsShipped = 1 AND IsCanceled IS NOT NULL) countsOne final note here, regarding the ShipDTM column. I know you may not have any control over this, but the CAST() around that column makes it look like it’s a varchar or similar. If it is, you should see if you can fix it, and I say “fix” because the schema really is considered broken.As it is, you’re likely converting every row in the table to a Date value — even rows you don’t need. Thanks to internationalization issues, these conversion are not the simple or fast process you might expect; in fact converting between either date or numeric and string values is always something to avoid as much as possible. They also invalidate any index you might have on the column. Even worse, you are repeating these conversions for each iteration! No wonder the query runs for multiple seconds!Almost much as the loop, these conversions are likely the source of the slowness. The good news is the the JOIN + GROUP BY version of my solution should at least get you back to only needing to convert these values once. Fixing the column (because again: it is broken) will get yet another speed boost. I do understand this is likely to be either above your pay grade or a vendor system you can’t change, but you should at least bring up the issue with someone who can influence this: either an architect/senior dev or the vendor directly.

Advertisement

Answer