I’ve discovered some very strange and counter-intuitive behaviour in PostgreSQL.
I have a query structure as follows. I am selecting both the IDs and the count from a subquery. The subquery does the filtering, joining, counting, but only orders by the IDs, since I’m using DISTINCT ON() to only get unique IDs.
The outer query then does the proper ordering, and any limits and offsets as needed. Here is an example of what the query structure looks like:
SELECT s.id, s.item_count FROM ( SELECT DISTINCT ON (work_items.id) work_items.id , work_item_states.disposition AS disposition , COUNT(work_items.id) OVER () AS item_count FROM work_items JOIN work_item_states ON work_item_states.work_item_refer = work_items.id WHERE work_item_states.disposition = 'cancelled' ORDER BY work_items.id ) AS s ORDER BY s.disposition LIMIT 50 OFFSET 0
I’ve discovered something strange however. My database has several million entries, so overall queries aren’t the fastest. But when I remove the ORDER BY clause in the OUTER query, it drastically slows down the query time.
However, if I also remove the LIMIT clause, it becomes fast again, despite the fact that this example query is returning 800 000+ results.
In summary, for the outer query:
ORDER BY AND LIMIT – Fast
... ) AS s ORDER BY s.disposition LIMIT 50 OFFSET 0
ONLY LIMIT – Very slow
... ) AS s LIMIT 50 OFFSET 0
ONLY ORDER BY – Fast, despite 800 000 results
... ) AS s ORDER BY s.disposition OFFSET 0
NEITHER – Fast, despite 800 000 results
... ) AS s OFFSET 0
To give an idea of how much slower only having the LIMIT clause is, with both, neither, or just ORDER BY, the queries take no more than about 10 seconds.
With only the LIMIT clause however, the queries take about a minute 15, over 7 times as long!
You’d think that ORDER BY would instead slow things down, as it has to sort the results of the subquery, but it seems that isn’t the case. It’s very counter-intuitive.
If someone knows what’s going on behind the scenes here, I’d greatly appreciate them shedding some light on this.
Thanks
EDIT – Added execution plans for statements:
ORDER BY and LIMIT execution plan
Limit (cost=518486.52..518486.65 rows=50 width=53) -> Sort (cost=518486.52..520495.59 rows=803628 width=53) Sort Key: s.disposition -> Subquery Scan on s (cost=479736.16..491790.58 rows=803628 width=53) -> Unique (cost=479736.16..483754.30 rows=803628 width=53) -> Sort (cost=479736.16..481745.23 rows=803628 width=53) Sort Key: work_items.id -> WindowAgg (cost=136262.98..345979.65 rows=803628 width=53) -> Hash Join (cost=136262.98..335934.30 rows=803628 width=45) Hash Cond: (work_items.id = work_item_states.work_item_refer) -> Seq Scan on work_items (cost=0.00..106679.48 rows=4020148 width=37) -> Hash (cost=119152.97..119152.97 rows=803681 width=45) -> Bitmap Heap Scan on work_item_states (cost=18968.96..119152.97 rows=803681 width=45) Recheck Cond: (disposition = 'cancelled'::text) -> Bitmap Index Scan on idx_work_item_states_disposition (cost=0.00..18768.04 rows=803681 width=0) Index Cond: (disposition = 'cancelled'::text)
Only LIMIT execution plan
Limit (cost=1.11..69.52 rows=50 width=45) -> Subquery Scan on s (cost=1.11..1099599.17 rows=803628 width=45) -> Unique (cost=1.11..1091562.89 rows=803628 width=77) -> WindowAgg (cost=1.11..1089553.82 rows=803628 width=77) -> Merge Join (cost=1.11..1079508.47 rows=803628 width=37) Merge Cond: (work_items.id = work_item_states.work_item_refer) -> Index Only Scan using idx_work_items_id on work_items (cost=0.56..477365.14 rows=4020148 width=37) -> Index Scan using idx_work_item_states_work_item_refer on work_item_states (cost=0.56..582047.48 rows=803681 width=37) Filter: (disposition = 'cancelled'::text)
Only ORDER BY execution plan
Sort (cost=625547.09..627556.16 rows=803628 width=53) Sort Key: s.disposition -> Subquery Scan on s (cost=479736.16..491790.58 rows=803628 width=53) -> Unique (cost=479736.16..483754.30 rows=803628 width=53) -> Sort (cost=479736.16..481745.23 rows=803628 width=53) Sort Key: work_items.id -> WindowAgg (cost=136262.98..345979.65 rows=803628 width=53) -> Hash Join (cost=136262.98..335934.30 rows=803628 width=45) Hash Cond: (work_items.id = work_item_states.work_item_refer) -> Seq Scan on work_items (cost=0.00..106679.48 rows=4020148 width=37) -> Hash (cost=119152.97..119152.97 rows=803681 width=45) -> Bitmap Heap Scan on work_item_states (cost=18968.96..119152.97 rows=803681 width=45) Recheck Cond: (disposition = 'cancelled'::text) -> Bitmap Index Scan on idx_work_item_states_disposition (cost=0.00..18768.04 rows=803681 width=0) Index Cond: (disposition = 'cancelled'::text)
Advertisement
Answer
You didn’t post your execution plans, but I have my crystal ball ready, so I’ll have a guess at what’s going on.
In your second, very slow query the optimizer has a bright idea how to make it fast:
It scans work_items
using the index on id
, fetches all the matching rows from work_item_states
in a nested loop and filters out everything that does not match work_item_states.disposition = 'cancelled'
until it has found 50 distinct result.
This is a good idea, but the optimizer does not know that all the rows with work_item_states.disposition = 'cancelled'
match work_items
with a high id
, so it has to scan forever until it has found its 50 rows.
All the other queries don’t allow the planner to choose that strategy because it is only promising if a few rows in work_items.id
order will do.