I have a feeling I am doing something terribly wrong but I can’t seem to figure it out.
I have the following query which I am trying to execute:
Select col1, col2, col3, col4, col5, day, month, year, sum(num1) as sum_num1, sum(num2) as sum_num2, count(*) as count_items from test_table where day = 10 and month = 5 and year = 2020 group by col1, col2, col3, col4, col5, day, month, year;
Also, I have an index on day, month, year
which I had set up using the following command
CREATE INDEX CONCURRENTLY testtable_dmy_idx on test_table (day, month, year);
Now I figured the setting to set sequential scan on/off and tried playing around with the query.
So when the ran the previous query with SET enable_seqscan TO on;
(which by the way is the default behaviour) and with EXPLAIN (analyze,buffers,timing)
, I get the following output:
-- Select Query with Sequential scan on QUERY PLAN Finalize GroupAggregate (cost=9733303.39..10836008.34 rows=5102790 width=89) (actual time=1100914.091..1110820.480 rows=491640 loops=1) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=25020 read=2793049 dirtied=10040, temp read=74932 written=75039" I/O Timings: read=1059425.134 -> Gather Merge (cost=9733303.39..10607468.38 rows=6454984 width=89) (actual time=1100911.426..1110193.876 rows=795097 loops=1) Workers Planned: 2 Workers Launched: 2 " Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956" I/O Timings: read=3178066.529 -> Partial GroupAggregate (cost=9732303.36..9861403.04 rows=3227492 width=89) (actual time=1100791.915..1107668.495 rows=265032 loops=3) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956" I/O Timings: read=3178066.529 -> Sort (cost=9732303.36..9740372.09 rows=3227492 width=81) (actual time=1100788.479..1105630.411 rows=2630708 loops=3) " Sort Key: col1, col2, col3, col4, col5" Sort Method: external merge Disk: 241320kB Worker 0: Sort Method: external merge Disk: 246776kB Worker 1: Sort Method: external merge Disk: 246336kB " Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956" I/O Timings: read=3178066.529 -> Parallel Seq Scan on test_table (cost=0.00..9074497.49 rows=3227492 width=81) (actual time=656277.982..1073808.146 rows=2630708 loops=3) Filter: ((day = 10) AND (month = 5) AND (year = 2020)) Rows Removed by Filter: 24027044 Buffers: shared hit=76855 read=8416561 dirtied=33686 I/O Timings: read=3178066.180 Planning Time: 4.017 ms Execution Time: 1111033.041 ms Total time - Around 18 minutes
and then when I set SET enable_seqscan TO off;
and run the same query with Explain, I get the following:
-- Select Query with Sequential scan off QUERY PLAN Finalize GroupAggregate (cost=10413126.05..11515831.01 rows=5102790 width=89) (actual time=59211.363..66579.750 rows=491640 loops=1) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=3 read=104091, temp read=77942 written=78052" I/O Timings: read=28662.857 -> Gather Merge (cost=10413126.05..11287291.05 rows=6454984 width=89) (actual time=59211.262..65973.857 rows=795178 loops=1) Workers Planned: 2 Workers Launched: 2 " Buffers: shared hit=33 read=218096, temp read=230092 written=230418" I/O Timings: read=51560.508 -> Partial GroupAggregate (cost=10412126.03..10541225.71 rows=3227492 width=89) (actual time=57013.922..62453.555 rows=265059 loops=3) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=33 read=218096, temp read=230092 written=230418" I/O Timings: read=51560.508 -> Sort (cost=10412126.03..10420194.76 rows=3227492 width=81) (actual time=57013.423..60368.530 rows=2630708 loops=3) " Sort Key: col1, col2, col3, col4, col5" Sort Method: external merge Disk: 246944kB Worker 0: Sort Method: external merge Disk: 246120kB Worker 1: Sort Method: external merge Disk: 241408kB " Buffers: shared hit=33 read=218096, temp read=230092 written=230418" I/O Timings: read=51560.508 -> Parallel Bitmap Heap Scan on test_table (cost=527733.84..9754320.16 rows=3227492 width=81) (actual time=18155.864..30957.312 rows=2630708 loops=3) Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020)) Rows Removed by Index Recheck: 1423 Heap Blocks: exact=13374 lossy=44328 Buffers: shared hit=3 read=218096 I/O Timings: read=51560.508 -> Bitmap Index Scan on testtable_dmy_idx (cost=0.00..525797.34 rows=7745982 width=0) (actual time=18148.218..18148.228 rows=7892123 loops=1) Index Cond: ((day = 10) AND (month = 5) AND (year = 2020)) Buffers: shared hit=3 read=46389 I/O Timings: read=17368.250 Planning Time: 2.787 ms Execution Time: 66783.481 ms Total Time - Around 1 min
I don’t seem to understand why I am getting this behaviour or what I am doing wrong because I would expect Postgres to optimize the query automatically but that’s not happening.
Any help would be really appreciated.
Edit 1:
Bit more information about the version of RDS postgres:
SELECT version();
PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
Edit 2:
Running with SET max_parallel_workers_per_gather TO 0
default was 2 (as shown by SHOW max_parallel_workers_per_gather
)
-- Select Query with Sequential scan ON QUERY PLAN GroupAggregate (cost=11515667.22..11799074.58 rows=5102790 width=89) (actual time=1120868.377..1133231.165 rows=491640 loops=1) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321" I/O Timings: read=1041723.362 -> Sort (cost=11515667.22..11535032.17 rows=7745982 width=81) (actual time=1120865.304..1129419.809 rows=7892123 loops=1) " Sort Key: col1, col2, col3, col4, col5" Sort Method: external merge Disk: 734304kB " Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321" I/O Timings: read=1041723.362 -> Seq Scan on test_table (cost=0.00..9888011.58 rows=7745982 width=81) (actual time=663266.269..1070560.993 rows=7892123 loops=1) Filter: ((day = 10) AND (month = 5) AND (year = 2020)) Rows Removed by Filter: 72081131 Buffers: shared hit=92450 read=8400966 I/O Timings: read=1041723.362 Planning Time: 5.829 ms Execution Time: 1133422.968 ms Total Time - Around 18 mins
Subsequently,
-- Select Query with Sequential scan OFF QUERY PLAN GroupAggregate (cost=12190966.21..12474373.57 rows=5102790 width=89) (actual time=109048.306..119255.079 rows=491640 loops=1) " Group Key: col1, col2, col3, col4, col5, day, month, year" " Buffers: shared hit=3 read=218096, temp read=295993 written=296321" I/O Timings: read=55697.723 -> Sort (cost=12190966.21..12210331.17 rows=7745982 width=81) (actual time=109047.621..115468.268 rows=7892123 loops=1) " Sort Key: col1, col2, col3, col4, col5" Sort Method: external merge Disk: 734304kB " Buffers: shared hit=3 read=218096, temp read=295993 written=296321" I/O Timings: read=55697.723 -> Bitmap Heap Scan on test_table (cost=527733.84..10563310.57 rows=7745982 width=81) (actual time=16941.764..62203.367 rows=7892123 loops=1) Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020)) Rows Removed by Index Recheck: 4270 Heap Blocks: exact=39970 lossy=131737 Buffers: shared hit=3 read=218096 I/O Timings: read=55697.723 -> Bitmap Index Scan on testtable_dmy_idx (cost=0.00..525797.34 rows=7745982 width=0) (actual time=16933.964..16933.964 rows=7892123 loops=1) Index Cond: ((day = 10) AND (month = 5) AND (year = 2020)) Buffers: shared hit=3 read=46389 I/O Timings: read=16154.294 Planning Time: 3.684 ms Execution Time: 119440.147 ms Total Time - Around 2 mins
EDIT 3:
I checked out the number of inserts, updates, deletes, live and dead tuples using the following
SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples" FROM pg_stat_user_tables where relname = 'test_table';
Got the following result
| inserts | updates | deletes | live_tuples | dead_tuples | |-------------|---------|-----------|-------------|-------------| | 296590964 | 0 | 412400995 | 79717032 | 7589442 |
Ran the following command
VACUUM (VERBOSE, ANALYZE) test_table
Got the following result:
[2020-05-15 18:34:08] [00000] vacuuming "public.test_table" [2020-05-15 18:37:13] [00000] scanned index "testtable_dmy_idx" to remove 7573896 row versions [2020-05-15 18:37:56] [00000] scanned index "testtable_unixts_idx" to remove 7573896 row versions [2020-05-15 18:38:16] [00000] "test_table": removed 7573896 row versions in 166450 pages [2020-05-15 18:38:16] [00000] index "testtable_dmy_idx" now contains 79973254 row versions in 1103313 pages [2020-05-15 18:38:16] [00000] index "testtable_unixts_idx" now contains 79973254 row versions in 318288 pages [2020-05-15 18:38:16] [00000] "test_table": found 99 removable, 2196653 nonremovable row versions in 212987 out of 8493416 pages [2020-05-15 18:38:16] [00000] vacuuming "pg_toast.pg_toast_25023" [2020-05-15 18:38:16] [00000] index "pg_toast_25023_index" now contains 0 row versions in 1 pages [2020-05-15 18:38:16] [00000] "pg_toast_25023": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages [2020-05-15 18:38:16] [00000] analyzing "public.test_table" [2020-05-15 18:38:27] [00000] "test_table": scanned 30000 of 8493416 pages, containing 282611 live rows and 0 dead rows; 30000 rows in sample, 80011093 estimated total rows [2020-05-15 18:38:27] completed in 4 m 19 s 58 ms
After that, the result of the same query looks like this:
| inserts | updates | deletes | live_tuples | dead_tuples | |-----------|---------|-----------|-------------|-------------| | 296590964 | 0 | 412400995 | 80011093 | 0 |
Advertisement
Answer
Rows Removed by Filter: 24027044 Buffers: shared hit=76855 read=8416561 dirtied=33686 I/O Timings: read=3178066.180
That is a lot of buffers to have been dirtied in a seq scan. I am guessing you have not vacuumed your table recently enough. Or autovac is falling way behind because you have accepted the default settings, which are way too slow (until v12) for most modern dedicated systems.
Also, 24027044 / 8416561 = roughly 2.85 rows per page. That is an extremely low number. Are your tuples extremely wide? Is your table extremely bloated? But neither of those answers your question, as the planner should know about them and take them into account. But we might need to know to figure out where the planner is going wrong. (These calculations may be off, because I don’t know which numbers are prorated for the number of workers and which are not–but I don’t think a factor of 3 changes the conclusion that something here is weird).
8416561 * 1024 * 8 / 3178.066 /1024 /1024 = 20 MB/S. That seems pretty low. What IO settings do you have configured on your RDS “hardware”? Your settings for seq_page_cost and random_page_cost may be wrong for your actual IO capacity. (Although this might not be very effective, see below)
For your Bitmap heap scan:
Heap Blocks: exact=13374 lossy=44328 Buffers: shared hit=3 read=218096
It looks like all the qualifying tuples are concentrated in a very small number of blocks (compared to the overall table size as revealed by the seq scan). I think the planner does not take this adequately into account for bitmap scans. There is a patch out there for this, but it has missed the deadline for v13. (If no one gets around to reviewing it, it might miss the deadline for v14 as well–nudge nudge.) Basically the planner knows that column “day” has a high correlation with the physical order of the table, and it uses this knowledge to say that the bitmap heap scan will be almost all sequential IO. But it fails to also deduce that it will scan only a small portion of the table. This issue makes the bitmap scan look just like the seq scan, but with an extra layer of overhead (consulting the index) so in light of that it is not surprising it does use it.