Skip to content
Advertisement

Amazon RDS – Postgres not using index for SELECT queries

I have a feeling I am doing something terribly wrong but I can’t seem to figure it out.

I have the following query which I am trying to execute:

Also, I have an index on day, month, year which I had set up using the following command

Now I figured the setting to set sequential scan on/off and tried playing around with the query.

So when the ran the previous query with SET enable_seqscan TO on; (which by the way is the default behaviour) and with EXPLAIN (analyze,buffers,timing), I get the following output:

and then when I set SET enable_seqscan TO off; and run the same query with Explain, I get the following:

I don’t seem to understand why I am getting this behaviour or what I am doing wrong because I would expect Postgres to optimize the query automatically but that’s not happening.

Any help would be really appreciated.


Edit 1:

Bit more information about the version of RDS postgres:

SELECT version();

PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

Edit 2:

Running with SET max_parallel_workers_per_gather TO 0 default was 2 (as shown by SHOW max_parallel_workers_per_gather)

Subsequently,

EDIT 3:

I checked out the number of inserts, updates, deletes, live and dead tuples using the following

Got the following result

Ran the following command

VACUUM (VERBOSE, ANALYZE) test_table

Got the following result:

After that, the result of the same query looks like this:

Advertisement

Answer

That is a lot of buffers to have been dirtied in a seq scan. I am guessing you have not vacuumed your table recently enough. Or autovac is falling way behind because you have accepted the default settings, which are way too slow (until v12) for most modern dedicated systems.

Also, 24027044 / 8416561 = roughly 2.85 rows per page. That is an extremely low number. Are your tuples extremely wide? Is your table extremely bloated? But neither of those answers your question, as the planner should know about them and take them into account. But we might need to know to figure out where the planner is going wrong. (These calculations may be off, because I don’t know which numbers are prorated for the number of workers and which are not–but I don’t think a factor of 3 changes the conclusion that something here is weird).

8416561 * 1024 * 8 / 3178.066 /1024 /1024 = 20 MB/S. That seems pretty low. What IO settings do you have configured on your RDS “hardware”? Your settings for seq_page_cost and random_page_cost may be wrong for your actual IO capacity. (Although this might not be very effective, see below)

For your Bitmap heap scan:

It looks like all the qualifying tuples are concentrated in a very small number of blocks (compared to the overall table size as revealed by the seq scan). I think the planner does not take this adequately into account for bitmap scans. There is a patch out there for this, but it has missed the deadline for v13. (If no one gets around to reviewing it, it might miss the deadline for v14 as well–nudge nudge.) Basically the planner knows that column “day” has a high correlation with the physical order of the table, and it uses this knowledge to say that the bitmap heap scan will be almost all sequential IO. But it fails to also deduce that it will scan only a small portion of the table. This issue makes the bitmap scan look just like the seq scan, but with an extra layer of overhead (consulting the index) so in light of that it is not surprising it does use it.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement