Skip to content
Advertisement

Amazon RDS – Postgres not using index for SELECT queries

I have a feeling I am doing something terribly wrong but I can’t seem to figure it out.

I have the following query which I am trying to execute:

Select col1, col2, col3, col4, col5, day, month, year,
       sum(num1) as sum_num1, 
       sum(num2) as sum_num2,
       count(*) as count_items
from test_table where day = 10 and month = 5 and year = 2020
group by col1, col2, col3, col4, col5, day, month, year;

Also, I have an index on day, month, year which I had set up using the following command

CREATE INDEX CONCURRENTLY testtable_dmy_idx on test_table (day, month, year);

Now I figured the setting to set sequential scan on/off and tried playing around with the query.

So when the ran the previous query with SET enable_seqscan TO on; (which by the way is the default behaviour) and with EXPLAIN (analyze,buffers,timing), I get the following output:

-- Select Query with Sequential scan on 

QUERY PLAN
Finalize GroupAggregate  (cost=9733303.39..10836008.34 rows=5102790 width=89) (actual time=1100914.091..1110820.480 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=25020 read=2793049 dirtied=10040, temp read=74932 written=75039"
  I/O Timings: read=1059425.134
  ->  Gather Merge  (cost=9733303.39..10607468.38 rows=6454984 width=89) (actual time=1100911.426..1110193.876 rows=795097 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
        I/O Timings: read=3178066.529
        ->  Partial GroupAggregate  (cost=9732303.36..9861403.04 rows=3227492 width=89) (actual time=1100791.915..1107668.495 rows=265032 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
              I/O Timings: read=3178066.529
              ->  Sort  (cost=9732303.36..9740372.09 rows=3227492 width=81) (actual time=1100788.479..1105630.411 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 241320kB
                    Worker 0:  Sort Method: external merge  Disk: 246776kB
                    Worker 1:  Sort Method: external merge  Disk: 246336kB
"                    Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
                    I/O Timings: read=3178066.529
                    ->  Parallel Seq Scan on test_table  (cost=0.00..9074497.49 rows=3227492 width=81) (actual time=656277.982..1073808.146 rows=2630708 loops=3)
                          Filter: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Filter: 24027044
                          Buffers: shared hit=76855 read=8416561 dirtied=33686
                          I/O Timings: read=3178066.180
Planning Time: 4.017 ms
Execution Time: 1111033.041 ms
Total time - Around 18 minutes

and then when I set SET enable_seqscan TO off; and run the same query with Explain, I get the following:

-- Select Query with Sequential scan off

QUERY PLAN
Finalize GroupAggregate  (cost=10413126.05..11515831.01 rows=5102790 width=89) (actual time=59211.363..66579.750 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=104091, temp read=77942 written=78052"
  I/O Timings: read=28662.857
  ->  Gather Merge  (cost=10413126.05..11287291.05 rows=6454984 width=89) (actual time=59211.262..65973.857 rows=795178 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
        I/O Timings: read=51560.508
        ->  Partial GroupAggregate  (cost=10412126.03..10541225.71 rows=3227492 width=89) (actual time=57013.922..62453.555 rows=265059 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
              I/O Timings: read=51560.508
              ->  Sort  (cost=10412126.03..10420194.76 rows=3227492 width=81) (actual time=57013.423..60368.530 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 246944kB
                    Worker 0:  Sort Method: external merge  Disk: 246120kB
                    Worker 1:  Sort Method: external merge  Disk: 241408kB
"                    Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
                    I/O Timings: read=51560.508
                    ->  Parallel Bitmap Heap Scan on test_table  (cost=527733.84..9754320.16 rows=3227492 width=81) (actual time=18155.864..30957.312 rows=2630708 loops=3)
                          Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Index Recheck: 1423
                          Heap Blocks: exact=13374 lossy=44328
                          Buffers: shared hit=3 read=218096
                          I/O Timings: read=51560.508
                          ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=18148.218..18148.228 rows=7892123 loops=1)
                                Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                                Buffers: shared hit=3 read=46389
                                I/O Timings: read=17368.250
Planning Time: 2.787 ms
Execution Time: 66783.481 ms
Total Time - Around 1 min

I don’t seem to understand why I am getting this behaviour or what I am doing wrong because I would expect Postgres to optimize the query automatically but that’s not happening.

Any help would be really appreciated.


Edit 1:

Bit more information about the version of RDS postgres:

SELECT version();

PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

Edit 2:

Running with SET max_parallel_workers_per_gather TO 0 default was 2 (as shown by SHOW max_parallel_workers_per_gather)

-- Select Query with Sequential scan ON
QUERY PLAN
GroupAggregate  (cost=11515667.22..11799074.58 rows=5102790 width=89) (actual time=1120868.377..1133231.165 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
  I/O Timings: read=1041723.362
  ->  Sort  (cost=11515667.22..11535032.17 rows=7745982 width=81) (actual time=1120865.304..1129419.809 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
        I/O Timings: read=1041723.362
        ->  Seq Scan on test_table  (cost=0.00..9888011.58 rows=7745982 width=81) (actual time=663266.269..1070560.993 rows=7892123 loops=1)
              Filter: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Filter: 72081131
              Buffers: shared hit=92450 read=8400966
              I/O Timings: read=1041723.362
Planning Time: 5.829 ms
Execution Time: 1133422.968 ms
Total Time - Around 18 mins

Subsequently,

-- Select Query with Sequential scan OFF
QUERY PLAN
GroupAggregate  (cost=12190966.21..12474373.57 rows=5102790 width=89) (actual time=109048.306..119255.079 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
  I/O Timings: read=55697.723
  ->  Sort  (cost=12190966.21..12210331.17 rows=7745982 width=81) (actual time=109047.621..115468.268 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
        I/O Timings: read=55697.723
        ->  Bitmap Heap Scan on test_table  (cost=527733.84..10563310.57 rows=7745982 width=81) (actual time=16941.764..62203.367 rows=7892123 loops=1)
              Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Index Recheck: 4270
              Heap Blocks: exact=39970 lossy=131737
              Buffers: shared hit=3 read=218096
              I/O Timings: read=55697.723
              ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=16933.964..16933.964 rows=7892123 loops=1)
                    Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                    Buffers: shared hit=3 read=46389
                    I/O Timings: read=16154.294
Planning Time: 3.684 ms
Execution Time: 119440.147 ms
Total Time - Around 2 mins

EDIT 3:

I checked out the number of inserts, updates, deletes, live and dead tuples using the following

SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"
FROM pg_stat_user_tables
where relname = 'test_table';

Got the following result

| inserts     | updates | deletes   | live_tuples | dead_tuples |
|-------------|---------|-----------|-------------|-------------|
| 296590964   | 0       | 412400995 | 79717032    | 7589442     |

Ran the following command

VACUUM (VERBOSE, ANALYZE) test_table

Got the following result:

[2020-05-15 18:34:08] [00000] vacuuming "public.test_table"
[2020-05-15 18:37:13] [00000] scanned index "testtable_dmy_idx" to remove 7573896 row versions
[2020-05-15 18:37:56] [00000] scanned index "testtable_unixts_idx" to remove 7573896 row versions
[2020-05-15 18:38:16] [00000] "test_table": removed 7573896 row versions in 166450 pages
[2020-05-15 18:38:16] [00000] index "testtable_dmy_idx" now contains 79973254 row versions in 1103313 pages
[2020-05-15 18:38:16] [00000] index "testtable_unixts_idx" now contains 79973254 row versions in 318288 pages
[2020-05-15 18:38:16] [00000] "test_table": found 99 removable, 2196653 nonremovable row versions in 212987 out of 8493416 pages
[2020-05-15 18:38:16] [00000] vacuuming "pg_toast.pg_toast_25023"
[2020-05-15 18:38:16] [00000] index "pg_toast_25023_index" now contains 0 row versions in 1 pages
[2020-05-15 18:38:16] [00000] "pg_toast_25023": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
[2020-05-15 18:38:16] [00000] analyzing "public.test_table"
[2020-05-15 18:38:27] [00000] "test_table": scanned 30000 of 8493416 pages, containing 282611 live rows and 0 dead rows; 30000 rows in sample, 80011093 estimated total rows
[2020-05-15 18:38:27] completed in 4 m 19 s 58 ms

After that, the result of the same query looks like this:

| inserts   | updates | deletes   | live_tuples | dead_tuples |
|-----------|---------|-----------|-------------|-------------|
| 296590964 | 0       | 412400995 | 80011093    | 0           |

Advertisement

Answer

   Rows Removed by Filter: 24027044
   Buffers: shared hit=76855 read=8416561 dirtied=33686
   I/O Timings: read=3178066.180

That is a lot of buffers to have been dirtied in a seq scan. I am guessing you have not vacuumed your table recently enough. Or autovac is falling way behind because you have accepted the default settings, which are way too slow (until v12) for most modern dedicated systems.

Also, 24027044 / 8416561 = roughly 2.85 rows per page. That is an extremely low number. Are your tuples extremely wide? Is your table extremely bloated? But neither of those answers your question, as the planner should know about them and take them into account. But we might need to know to figure out where the planner is going wrong. (These calculations may be off, because I don’t know which numbers are prorated for the number of workers and which are not–but I don’t think a factor of 3 changes the conclusion that something here is weird).

8416561 * 1024 * 8 / 3178.066 /1024 /1024 = 20 MB/S. That seems pretty low. What IO settings do you have configured on your RDS “hardware”? Your settings for seq_page_cost and random_page_cost may be wrong for your actual IO capacity. (Although this might not be very effective, see below)

For your Bitmap heap scan:

Heap Blocks: exact=13374 lossy=44328
Buffers: shared hit=3 read=218096

It looks like all the qualifying tuples are concentrated in a very small number of blocks (compared to the overall table size as revealed by the seq scan). I think the planner does not take this adequately into account for bitmap scans. There is a patch out there for this, but it has missed the deadline for v13. (If no one gets around to reviewing it, it might miss the deadline for v14 as well–nudge nudge.) Basically the planner knows that column “day” has a high correlation with the physical order of the table, and it uses this knowledge to say that the bitmap heap scan will be almost all sequential IO. But it fails to also deduce that it will scan only a small portion of the table. This issue makes the bitmap scan look just like the seq scan, but with an extra layer of overhead (consulting the index) so in light of that it is not surprising it does use it.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement