Skip to content
Advertisement

How inaccurate can the sys.dm_db_partition_stats.row_count be in getting an Azure SQL DB row count for each table?

I have seen a number of general statements on how sys.dm_db_partition_stats.row_count can produce inaccurate results due to providing objects’ statistics instead of actually doing a COUNT(). However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.

So I would like to learn –

  1. How inaccurate this method can actually be?
  2. Why exactly the results might be skewed?
    (e.g. stats are only recalculated once per day / on specific object operation).

Any related insight is much appreciated !



Several things I was able to find out on my own — mostly by running various queries containing sys.dm_db_partition_stats.row_count, while knowing actual row counts in each table.

Here’s a final query I came up with
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don’t filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don’t use GROUP BY)

Advertisement

Answer

We’re glad that you found the solution and solved it by yourself. Your new edition should be an answer. I just help you post it as answer and this can be beneficial to other community members:

Several things I was able to find out on my own — mostly by running various queries containing sys.dm_db_partition_stats.row_count, while knowing actual row counts in each table.

Here’s a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don’t filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don’t use GROUP BY)

Thanks for your sharing again.

And thanks for @conor’s commnet: “If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction.”

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement