Is it possible to get a the sum of value
from the calendar_table
to the main_table
without joining like below?
select
date, sum(value)
from
main_table
inner join
calendar_table on start_date <= date and end_date >= date
group by
date
I am trying to avoid a join like this because main_table
is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I’ve already indexed both tables.
Sample desired results:
+-----------+-------+
| date | total |
+-----------+-------+
| 7-24-2010 | 11 |
+-----------+-------+
Sample tables
calendar_table:
+-----------+-------+
| date | value |
+-----------+-------+
| 7-24-2010 | 5 |
| 7-25-2010 | 6 |
| | |
| 7-23-2020 | 2 |
| 7-24-2020 | 10 |
+-----------+-------+
main_table:
+------------+-----------+
| start_date | end_date |
+------------+-----------+
| 7-24-2010 | 7-25-2010 |
| 8-1-2011 | 8-5-2011 |
+------------+-----------+
Advertisement
Answer
You want the sum in the calendar table. So, I would recommend an “incremental” approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:
select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
union all
select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date;
The final step is to aggregate and do a cumulative sum:
select date, sum(inc) as value_on_date,
sum(sum(inc)) over (order by date) as net_value
from ((select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
) union all
(select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date
)
) c
group by date
order by date;
This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.