Skip to content
Advertisement

Rolling sum based on date (when dates are missing)

You may be aware of rolling the results of an aggregate over a specific number of preceding rows. I.e.: how many hot dogs did I eat over the last 7 days

SELECT HotDogCount, 
       DateKey, 
       SUM(HotDogCount) OVER (ORDER BY DateKey ROWS 6 PRECEDING) AS HotDogsLast7Days
FROM dbo.HotDogConsumption

Results:

+-------------+------------+------------------+
| HotDogCount |  DateKey   | HotDogsLast7Days |
+-------------+------------+------------------+
|           3 | 09/21/2020 |                3 |
|           2 | 9/22/2020  |                5 |
|           1 | 09/23/2020 |                6 |
|           1 | 09/24/2020 |                7 |
|           1 | 09/25/2020 |                8 |
|           4 | 09/26/2020 |               12 |
|           1 | 09/27/2020 |               13 |
|           3 | 09/28/2020 |               13 |
|           2 | 09/29/2020 |               13 |
|           1 | 09/30/2020 |               13 |
+-------------+------------+------------------+

Now, the problem I am having is when there are gaps in the dates. So, basically, one day my intestines and circulatory system are screaming at me: “What the heck are you doing, you’re going to kill us all!!!” So, I decide to give my body a break for a day and now there is no record for that day. When I use the “ROWS 6 PRECEDING” method, I will now be reaching back 8 days, rather than 7, because one day was missed.

So, the question is, do any of you know how I could use the OVER clause to truly use a date value (something like “DATEADD(day,-7,DateKey)”) to determine how many previous rows should be summed up for a true 7 day rolling sum, regardless of whether I only ate hot dogs on one day or on all 7 days?

Side note, to have a record of 0 for the days I didn’t eat any hotdogs is not an option. I understand that I could use an array of dates and left join to it and do a

CASE WHEN Datekey IS NULL THEN 0 END

type of deal, but I would like to find out if there is a different way where the rows preceding value can somehow be determined dynamically based on the date.

Advertisement

Answer

Window functions are the right approach in theory. But to look back at the 7 preceding days (not rows), we need a range frame specification – which, unfornately, SQL Server does not support.

I am going to recommend a subquery, or a lateral join:

select hdc.*, hdc1.*
from dbo.HotDogConsumption hdc
cross apply (
    select coalesce(sum(HotDogCount), 0) HotDogsLast7Days 
    from dbo.HotDogConsumption hdc1
    where hdc1.datekey >= dateadd(day, -7, hdc.datekey)
      and hdc1.datekey < hdc.datekey
) hdc1

You might want to adjust the conditions in the where clause of the subquery to the precise frame that you want. The above code computes over the last 7 days, not including today. Something equivalent to your current attempt would be like:

    where hdc1.datekey >= dateadd(day, -6, hdc.datekey)
      and hdc1.datekey <= hdc.datekey
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement