sum columns of a table, grouping by table partition

I have a table which holds daily metrics :

|DateID   | Metric1 | Metric2|
 20190501      24       34
 20190502      25       56 
 .....        ....     ....

​x
 
|DateID   | Metric1 | Metric2| 20190501      24       34 20190502      25       56  .....        ....     ....​

The table has more than 150 million rows. It is partitioned on DateID, per month (each partition holds rows which DateID from first of month until the first of the next month). Some of my partitions:

rows    pages   comparison  value
4205460 174009  less than   20180801
4205460 174097  less than   20180901
4069800 168449  less than   20181001
4205460 174009  less than   20181101
4069800 168433  less than   20181201
4205460 174097  less than   20190101
4205460 174009  less than   20190201
3798480 157201  less than   20190301
4205460 174097  less than   20190401
4069800 168449  less than   20190501
2984520 123545  less than   20190601

 
rows    pages   comparison  value4205460 174009  less than   201808014205460 174097  less than   201809014069800 168449  less than   201810014205460 174009  less than   201811014069800 168433  less than   201812014205460 174097  less than   201901014205460 174009  less than   201902013798480 157201  less than   201903014205460 174097  less than   201904014069800 168449  less than   201905012984520 123545  less than   20190601​

I would like to select sum(metric1), sum(metric2), per month.

What i am doing so far is to create a temp table which holds YYYYMM as MonthID and join with my table on substring(dateID,1,6) = MonthID and then select the sums grouping by MonthID. However this is too slow. I think It would be faster if somehow i could group by using directly the table’s partitions. Any way to do that? If not any way to improve the sum performance?

Answer

You can try two levels of aggregation:

select left(dateid, 6) as yyyymm,
       sum(metric1), sum(metric2)
from (select dateid, sum(metric1) as metric1, sum(metric2) as metric2
      from t
      group by dateid
     ) t
group by left(dateid, 6)
order by yyyymm;

 
select left(dateid, 6) as yyyymm,       sum(metric1), sum(metric2)from (select dateid, sum(metric1) as metric1, sum(metric2) as metric2      from t      group by dateid     ) tgroup by left(dateid, 6)order by yyyymm;​

The inner aggregation is explicitly on only the partitioning key.

I should note that I do not know if SQL Server optimizes group by on partitioned databases, when the partitioning key is an aggregation key. However, it might, so this is worth trying.

Advertisement

Answer