Skip to content
Advertisement

Data aggregation by sliding time periods

[Query and question edited and fixed thanks to comments from @Gordon Linoff and @shawnt00]

I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since ‘2019-01-01’ to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).

I have several questions:

  1. One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
  2. Is there a better way of doing the time comparisons?
  3. Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.

Note: LogDate original format is like 2019-04-01 18:30:12.000.

DECLARE @dt1 Datetime='2019-01-01'
DECLARE @dt2 Datetime=getDate();

WITH ctedaterange 
     AS (SELECT [Dates]=@dt1 
         UNION ALL
         SELECT [dates] + 30 
         FROM   ctedaterange 
         WHERE  [dates] + 30<= @dt2) 

SELECT 
[dates], 
lt.Activity, COUNT(*) as Total,
    SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
    SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
    SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM   ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)

Sample dataset (LogTable):

LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01

Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)

As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.

Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3

Thank you for any suggestion.

Advertisement

Answer

SQL Server doesn’t yet have the option to range over values of the window frame of an analytic function. Since you’ve generated all possible dates though and you’ve already got the counts by date, it’s very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:

sum(count(LogDate)) over (
    partition by Activity order by [dates]
    with rows between 89 preceding and current row
)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement