Is it possible to get a the sum of value
from the calendar_table
to the main_table
without joining like below?
select date, sum(value) from main_table inner join calendar_table on start_date <= date and end_date >= date group by date
I am trying to avoid a join like this because main_table
is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I’ve already indexed both tables.
Sample desired results:
+-----------+-------+ | date | total | +-----------+-------+ | 7-24-2010 | 11 | +-----------+-------+
Sample tables
calendar_table:
+-----------+-------+ | date | value | +-----------+-------+ | 7-24-2010 | 5 | | 7-25-2010 | 6 | | ... | ... | | 7-23-2020 | 2 | | 7-24-2020 | 10 | +-----------+-------+
main_table:
+------------+-----------+ | start_date | end_date | +------------+-----------+ | 7-24-2010 | 7-25-2010 | | 8-1-2011 | 8-5-2011 | +------------+-----------+
Advertisement
Answer
You want the sum in the calendar table. So, I would recommend an “incremental” approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:
select c.date, c.value as inc from main_table m join calendar_table t on m.start_date = c.date union all select dateadd(day, 1, c.date), - c.value as inc from main_table m join calendar_table t on m.end_date = c.date;
The final step is to aggregate and do a cumulative sum:
select date, sum(inc) as value_on_date, sum(sum(inc)) over (order by date) as net_value from ((select c.date, c.value as inc from main_table m join calendar_table t on m.start_date = c.date ) union all (select dateadd(day, 1, c.date), - c.value as inc from main_table m join calendar_table t on m.end_date = c.date ) ) c group by date order by date;
This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.