Using PostgreSQL 13, I ran into a performance issue selecting the highest id from a view that joins two tables, depending on the select statement I execute.
Here’s a sample setup:
CREATE TABLE test1 ( id BIGSERIAL PRIMARY KEY, joincol VARCHAR ); CREATE TABLE test2 ( joincol VARCHAR ); CREATE INDEX ON test1 (id); CREATE INDEX ON test1 (joincol); CREATE INDEX ON test2 (joincol); CREATE VIEW testview AS ( SELECT test1.id, test1.joincol AS t1charcol, test2.joincol AS t2charcol FROM test1, test2 WHERE test1.joincol = test2.joincol );
What I found out
I’m executing two statements which result in completely different execution plans and runtimes. The following statement executes in less than 100ms. As far as I understand the execution plan, the runtime is independent of the rowcount, since Postgres iterates the rows one by one (starting at the highest id, using the index) until a join on a row is possible and immediately returns.
SELECT id FROM testview ORDER BY ID DESC LIMIT 1;
However, this one takes over 1 second on average (depending on rowcount), since the two tables are “joined completely”, before Postgres uses the index to select the highest id.
SELECT MAX(id) FROM testview;
Please refer to this sample on dbfiddle to check the explain plans:
My real environment
On my real environment
test1 contains only a hand full of rows (< 100), having unique values in
test2 contains up to ~10M rows, where
joincol always matches a value of
joincol is not nullable.
The actual question
Why does Postgres not recognize that it could use an Index Scan Backward on row basis for the second select? Is there anything I could improve on the tables/indexes?
Queries not strictly equivalent
why does Postgres not recognize that it could use a Index Scan Backward on row basis for the second select?
To make the context clear:
ORDER BY ... LIMIT 1does not.
NULLvalues sort last in ascending sort order, and first in descending. So an
Index Scan Backwardmight not find the greatest value (according to
max()) first, but any number of
The formal equivalent of:
SELECT max(id) FROM testview;
SELECT id FROM testview ORDER BY id DESC LIMIT 1;
SELECT id FROM testview ORDER BY id DESC NULLS LAST LIMIT 1;
The latter query doesn’t get the fast query plan. But it would with an index with matching sort order:
(id DESC NULLS LAST).
That’s different for the aggregate functions
max(). Those get a fast plan when targeting table
test1 directly using the plain PK index on
(id). But not when based on the view (or the underlying join-query directly – the view is not the blocker). An index sorting NULL values in the right place has hardly any effect.
We know that
id in this query can never be
NULL. The column is defined
NOT NULL. And the join in the view is effectively an
INNER JOIN which cannot introduce
NULL values for
We also know that the index on
test.id cannot contain NULL values.
But the Postgres query planner is not an AI. (Nor does it try to be, that could get out of hands quickly.) I see two shortcomings:
max()get the fast plan only when targeting the table, regardless of index sort order, an index condition is added:
Index Cond: (id IS NOT NULL)
ORDER BY ... LIMIT 1gets the fast plan only with the exactly matching index sort order.
Not sure, whether that might be improved (easily).
db<>fiddle here – demonstrating all of the above
Is there anything I could improve on the tables/indexes?
This index is completely useless:
CREATE INDEX ON "test" ("id");
The PK on
test.id is implemented with a unique index on the column, that already covers everything the additional index might do for you.
There may be more, waiting for the question to clear up.
Distorted test case
The test case is too far away from actual use case to be meaningful.
In the test setup, each table has 100k rows, there is no guarantee that every value in
joincol has a match on the other side, and both columns can be NULL
Your real case has 10M rows in
table1 and < 100 rows in
table2, every value in
table1.joincol has a match in
table2.joincol, both are defined
NOT NULL, and
table2.joincol is unique. A classical one-to-many relationship. There should be a
UNIQUE constraint on
table2.joincol and a FK constraint
t1.joincol --> t2.joincol.
But that’s currently all twisted in the question. Standing by till that’s cleaned up.