I have a table which holds daily metrics :
|DateID | Metric1 | Metric2| 20190501 24 34 20190502 25 56 ..... .... ....
The table has more than 150 million rows. It is partitioned on DateID, per month (each partition holds rows which DateID from first of month until the first of the next month). Some of my partitions:
rows pages comparison value 4205460 174009 less than 20180801 4205460 174097 less than 20180901 4069800 168449 less than 20181001 4205460 174009 less than 20181101 4069800 168433 less than 20181201 4205460 174097 less than 20190101 4205460 174009 less than 20190201 3798480 157201 less than 20190301 4205460 174097 less than 20190401 4069800 168449 less than 20190501 2984520 123545 less than 20190601
I would like to select sum(metric1), sum(metric2)
, per month.
What i am doing so far is to create a temp table which holds YYYYMM as MonthID and join with my table on substring(dateID,1,6) = MonthID
and then select the sums grouping by MonthID. However this is too slow. I think It would be faster if somehow i could group by using directly the table’s partitions. Any way to do that? If not any way to improve the sum performance?
Advertisement
Answer
You can try two levels of aggregation:
select left(dateid, 6) as yyyymm, sum(metric1), sum(metric2) from (select dateid, sum(metric1) as metric1, sum(metric2) as metric2 from t group by dateid ) t group by left(dateid, 6) order by yyyymm;
The inner aggregation is explicitly on only the partitioning key.
I should note that I do not know if SQL Server optimizes group by
on partitioned databases, when the partitioning key is an aggregation key. However, it might, so this is worth trying.