I have seen a number of general statements on how sys.dm_db_partition_stats.row_count
can produce inaccurate results due to providing objects’ statistics instead of actually doing a COUNT()
. However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.
So I would like to learn –
- How inaccurate this method can actually be?
- Why exactly the results might be skewed?
(e.g. stats are only recalculated once per day / on specific object operation).
Any related insight is much appreciated !
Several things I was able to find out on my own — mostly by running various queries containing sys.dm_db_partition_stats.row_count
, while knowing actual row counts in each table.
Here’s a final query I came up with
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.
SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name, B.object_id, B.index_id, B.row_count FROM sys.dm_db_partition_stats B LEFT JOIN sys.objects A ON A.object_id = B.object_id WHERE SCHEMA_NAME(A.schema_id) <> 'sys' AND (B.index_id = '0' OR B.index_id = '1') ORDER BY B.row_count DESC
First line of WHERE
clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats
and many others.
Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don’t filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name
or two records with the same table_name
in the query output (if you don’t use GROUP BY
)
Advertisement
Answer
We’re glad that you found the solution and solved it by yourself. Your new edition should be an answer. I just help you post it as answer and this can be beneficial to other community members:
Several things I was able to find out on my own — mostly by running various queries containing sys.dm_db_partition_stats.row_count
, while knowing actual row counts in each table.
Here’s a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.
SELECT (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name, B.object_id, B.index_id, B.row_count FROM sys.dm_db_partition_stats B LEFT JOIN sys.objects A ON A.object_id = B.object_id WHERE SCHEMA_NAME(A.schema_id) <> 'sys' AND (B.index_id = '0' OR B.index_id = '1') ORDER BY B.row_count DESC
First line of WHERE
clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats and many others.
Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don’t filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name
or two records with the same table_name
in the query output (if you don’t use GROUP BY
)
Thanks for your sharing again.
And thanks for @conor’s commnet: “If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction.”