Skip to content
Advertisement

SQL Server : getting sum of values in “calendar” table without joining

Is it possible to get a the sum of value from the calendar_table to the main_table without joining like below?

select 
    date, sum(value)
from 
    main_table
inner join 
    calendar_table on start_date <= date and end_date >= date
group by 
    date

I am trying to avoid a join like this because main_table is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I’ve already indexed both tables.

Sample desired results:

+-----------+-------+
|   date    | total |
+-----------+-------+
| 7-24-2010 |    11 |
+-----------+-------+

Sample tables

calendar_table:

+-----------+-------+
|   date    | value |
+-----------+-------+
| 7-24-2010 |     5 |
| 7-25-2010 |     6 |
| ...       |   ... |
| 7-23-2020 |     2 |
| 7-24-2020 |    10 |
+-----------+-------+

main_table:

+------------+-----------+
| start_date | end_date  |
+------------+-----------+
| 7-24-2010  | 7-25-2010 |
| 8-1-2011   | 8-5-2011  |
+------------+-----------+

Advertisement

Answer

You want the sum in the calendar table. So, I would recommend an “incremental” approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:

select c.date, c.value as inc
from main_table m join
     calendar_table t
     on m.start_date = c.date
union all
select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
     calendar_table t
     on m.end_date = c.date;

The final step is to aggregate and do a cumulative sum:

select date, sum(inc) as value_on_date,
       sum(sum(inc)) over (order by date) as net_value
from ((select c.date, c.value as inc
       from main_table m join
            calendar_table t
            on m.start_date = c.date
      ) union all
      (select dateadd(day, 1, c.date), - c.value as inc
       from main_table m join
            calendar_table t
            on m.end_date = c.date
      )
     ) c
group by date
order by date;

This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement