Skip to content
Advertisement

Postgres changing the query from index Only scan to bit map scan when data set increases

I have two same queries but with different where condition values

explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 1565 and account_id = 225 and deleted_at is NULL group by survey_contact_id, relation_id;
explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 888 and account_id = 12 and deleted_at is NULL group by survey_contact_id, relation_id;

When I ran this two queries they both producing different result

for first query the result

GroupAggregate  (cost=0.28..8.32 rows=1 width=24) (actual time=0.016..0.021 rows=4 loops=1)
  Group Key: survey_contact_id, relation_id
  ->  Index Only Scan using test on nomination  (cost=0.28..8.30 rows=1 width=8) (actual time=0.010..0.012 rows=5 loops=1)
        Index Cond: ((account_id = 225) AND (survey_id = 1565))
        Heap Fetches: 5
Planning time: 0.148 ms
Execution time: 0.058 ms

and for the 2nd one

GroupAggregate  (cost=11.08..11.12 rows=2 width=24) (actual time=0.015..0.015 rows=0 loops=1)
  Group Key: survey_contact_id, relation_id
  ->  Sort  (cost=11.08..11.08 rows=2 width=8) (actual time=0.013..0.013 rows=0 loops=1)
        Sort Key: survey_contact_id, relation_id
        Sort Method: quicksort  Memory: 25kB
        ->  Bitmap Heap Scan on nomination  (cost=4.30..11.07 rows=2 width=8) (actual time=0.008..0.008 rows=0 loops=1)
              Recheck Cond: ((account_id = 12) AND (survey_id = 888) AND (deleted_at IS NULL))
              ->  Bitmap Index Scan on test  (cost=0.00..4.30 rows=2 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                    Index Cond: ((account_id = 12) AND (survey_id = 888))
Planning time: 0.149 ms
Execution time: 0.052 ms

Can anyone explain Me why Postgres is making a BitMap Scan instead of Index Only scan ?

Advertisement

Answer

The short version is that Postgres has a cost-based approach, so it has estimated that the cost of doing so is less in the second case, based on the statistics it has.

In your case, the total cost (estimated) of each of these queries is 8.32 and 11.12 respectively. You may be able to see the cost of the index-only scan for the second query by running set enable_bitmapscan = off.

Note that, based on its statistics, Postgres estimated that the first query would return 1 row (actually 4), and that the second would return 2 rows (actually 0).

There are several ways to get better statistics, but if analyze (or autovacuum) hasn’t been run on that table for a while, that is a common cause of bad estimates. Another tell-tale that vacuum may not have been run recently (at least on this table) is the Heap Fetches: 5 you can see in the first query plan.

I’m confused by the “when data set increases” part of your question, please do add more context on that front if relevant.

Finally, if you’re not already planning a PostgreSQL upgrade, I highly recommend doing so soon. 9.6 is nearly out of support, and versions 10, 11, 12, and 13 each contained a host of performance focussed features.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement