Optimization of Group by Cube in SQL Server

Question

I would like to make a GROUP BY CUBE of a table with 9 columns and more than 107 millions of rows. Here is an example of my code: This is running in SQL Server. For 10K rows it is taking 7 seconds, but when I increase the number of rows to the total 107 million it took more than

Accepted Answer

Apart from something obvious like adding index on final_state column (if it is selective enough) or even creating a covering index (heavy on storage and will degrade insert/update performance), you could use Grouping Sets instead of Cube+Having. It will aggregate the data by column combinations you actually need, instead of first calculating all possible combinations with Cube and then filtering them with Having. This might be faster, but if the result of this query also has tens of millions of rows, don&#8217;t expect any fireworks. I tested this on my server (MSSQL 2012) and it turns out that query with Cube+Having performed 6 separate index scans and then concatenated the streams, while query with Grouping Sets that yields the same result performed only one scan and was few times faster.

Advertisement

Answer