I have a set of SQLITE tables that adds up to 7 GB on disk. The particular table I am querying (FQ) has approximately 50 columns, and 3 million rows.
I am querying from the sqlite3 interactive shell (sqlite3.exe). The query I am running is: “select count(Date) from FQ;”. The approximately 3 million rows take 10+ minutes to count. After the first time, it seems to be cached and the result is almost instant. I am running on a Windows 10 PC with 8 GB RAM and nothing else running.
Date is one of two primary keys (it is Date and ID). There are 360 unique dates, and ~8-10k IDs, and the table has one entry for each date/ID combination.
Here are some things I have already done:
- I have a covering index on the entire table.
- I have run ANALYZE on this database.
- When I do an “EXPLAIN QUERY PLAN” it says it is doing a table scan (as expected for counting) using the covering index.
How can a simple scan through a table of 3 million rows could take so long?
[EDIT: I should clarify that I’m not interested in alternate ways of counting – I am hoping that scans don’t have to be so slow (it is also slow, for example, with using sum()+”group by”)]
[UPDATE: Today I tried two additional things – first I tried using “WITHOUT ROWID” and the results were similar either way. Then I removed the indices for all my tables altogether. Now the count of a few million rows finishes in 4 seconds. The database file is naturally smaller (2 GB vs 7 GB) now that all the indices are gone, but that shouldn’t explain a 10 minutes to 4 seconds kind of difference! What makes the covering index slow down a table scan? Is there something where scanning an index is slower, and if so, why doesn’t SQLITE just scan the original table itself?]
Advertisement
Answer
I finally figured out the problem. Running a VACUUM command on the database solved the issue. I had run .dbinfo to confirm that the page_size multiplied by the number of pages added up to approximately the file size. That, combined with the fact that I hadn’t deleted anything from the database (only inserted), led me to assume that I didn’t need to vacuum (or de-fragment).
But it looks like the re-organization that vacuum does also makes a dramatic difference in the speed of the count queries (which now finish in milliseconds as I’ve seen reported elsewhere).