I have a table which holds daily metrics :
|DateID | Metric1 | Metric2|
20190501 24 34
20190502 25 56
.. . .
The table has more than 150 million rows. It is partitioned on DateID, per month (each partition holds rows which DateID from first of month until the first of the next month). Some of my partitions:
rows pages comparison value
4205460 174009 less than 20180801
4205460 174097 less than 20180901
4069800 168449 less than 20181001
4205460 174009 less than 20181101
4069800 168433 less than 20181201
4205460 174097 less than 20190101
4205460 174009 less than 20190201
3798480 157201 less than 20190301
4205460 174097 less than 20190401
4069800 168449 less than 20190501
2984520 123545 less than 20190601
I would like to select sum(metric1), sum(metric2)
, per month.
What i am doing so far is to create a temp table which holds YYYYMM as MonthID and join with my table on substring(dateID,1,6) = MonthID
and then select the sums grouping by MonthID. However this is too slow. I think It would be faster if somehow i could group by using directly the table’s partitions. Any way to do that? If not any way to improve the sum performance?
Advertisement
Answer
You can try two levels of aggregation:
select left(dateid, 6) as yyyymm,
sum(metric1), sum(metric2)
from (select dateid, sum(metric1) as metric1, sum(metric2) as metric2
from t
group by dateid
) t
group by left(dateid, 6)
order by yyyymm;
The inner aggregation is explicitly on only the partitioning key.
I should note that I do not know if SQL Server optimizes group by
on partitioned databases, when the partitioning key is an aggregation key. However, it might, so this is worth trying.