Skip to content
Advertisement

sum columns of a table, grouping by table partition

I have a table which holds daily metrics :

|DateID   | Metric1 | Metric2|
 20190501      24       34
 20190502      25       56 
 .....        ....     ....

The table has more than 150 million rows. It is partitioned on DateID, per month (each partition holds rows which DateID from first of month until the first of the next month). Some of my partitions:

rows    pages   comparison  value
4205460 174009  less than   20180801
4205460 174097  less than   20180901
4069800 168449  less than   20181001
4205460 174009  less than   20181101
4069800 168433  less than   20181201
4205460 174097  less than   20190101
4205460 174009  less than   20190201
3798480 157201  less than   20190301
4205460 174097  less than   20190401
4069800 168449  less than   20190501
2984520 123545  less than   20190601

I would like to select sum(metric1), sum(metric2), per month.

What i am doing so far is to create a temp table which holds YYYYMM as MonthID and join with my table on substring(dateID,1,6) = MonthID and then select the sums grouping by MonthID. However this is too slow. I think It would be faster if somehow i could group by using directly the table’s partitions. Any way to do that? If not any way to improve the sum performance?

Advertisement

Answer

You can try two levels of aggregation:

select left(dateid, 6) as yyyymm,
       sum(metric1), sum(metric2)
from (select dateid, sum(metric1) as metric1, sum(metric2) as metric2
      from t
      group by dateid
     ) t
group by left(dateid, 6)
order by yyyymm;

The inner aggregation is explicitly on only the partitioning key.

I should note that I do not know if SQL Server optimizes group by on partitioned databases, when the partitioning key is an aggregation key. However, it might, so this is worth trying.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement