Skip to content
Advertisement

Faster counts with mysql by sampling table

I’m looking for a way I can get a count for records meeting a condition but my problem is the table is billions of records long and a basic count(*) is not possible as it times out.

I thought that maybe it would be possible to sample the table by doing something like selecting 1/4th of the records. I believe that older records will be more likely to match so I’d need a method which accounts for this (perhaps random sorting).

Is it possible or reasonable to query a certain percent of rows in mysql? And is this the smartest way to go about solving this problem?

The query I currently have which doesn’t work is pretty simple:

SELECT count(*) FROM table_name WHERE deleted_at IS NOT NULL

Advertisement

Answer

SHOW TABLE STATUS will ‘instantly’ give an approximate Row count. (There is an equivalent SELECT ... FROM information_schema.tables.) However, this may be significantly far off.

A count(*) on an index on any column in the PRIMARY KEY will be faster because it will be smaller. But this still may not be fast enough.

There is no way to “sample”. Or at least no way that is reliably better than SHOW TABLE STATUS. EXPLAIN SELECT ... with some simple query will do an estimate; again, not necessarily any better.

Please describe what kind of data you have; there may be some other tricks we can use.

See also Random . There may be a technique that will help you “sample”. Be aware that all techniques are subject to various factors of how the data was generated and whether there has been “churn” on the table.

Can you periodically run the full COUNT(*) and save it somewhere? And then maintain the count after that?

I assume you don’t have this case. (Else the solution is trivial.)

  • AUTO_INCREMENT id
  • Never DELETEd or REPLACEd or INSERT IGNOREd or ROLLBACKd any rows
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement