Performance impact of view on aggregate function vs result set limiting

Question

The problem Using PostgreSQL 13, I ran into a performance issue selecting the highest id from a view that joins two tables, depending on the select statement I execute. Here's a sample setup: What I found out I'm executing two statements which result in completely different execution plans and runtimes. The following statement executes in less than 100ms. As far

Accepted Answer

Queries not strictly equivalentwhy does Postgres not recognize that it could use a Index Scan Backward on row basis for the second select?To make the context clear:max(id) excludes NULL values. But ORDER BY ... LIMIT 1 does not.NULL values sort last in ascending sort order, and first in descending. So an Index Scan Backward might not find the greatest value (according to max()) first, but any number of NULL values.The formal equivalent of:SELECT max(id) FROM testview;is not:SELECT id FROM testview ORDER BY id DESC LIMIT 1;but:SELECT id FROM testview ORDER BY id DESC NULLS LAST LIMIT 1;The latter query doesn’t get the fast query plan. But it would with an index with matching sort order: (id DESC NULLS LAST).That’s different for the aggregate functions min() and max(). Those get a fast plan when targeting table test1 directly using the plain PK index on (id). But not when based on the view (or the underlying join-query directly – the view is not the blocker). An index sorting NULL values in the right place has hardly any effect.We know that id in this query can never be NULL. The column is defined NOT NULL. And the join in the view is effectively an INNER JOIN which cannot introduce NULL values for id.We also know that the index on test.id cannot contain NULL values.But the Postgres query planner is not an AI. (Nor does it try to be, that could get out of hands quickly.) I see two shortcomings:min() and max() get the fast plan only when targeting the table, regardless of index sort order, an index condition is added: Index Cond: (id IS NOT NULL)ORDER BY ... LIMIT 1 gets the fast plan only with the exactly matching index sort order.Not sure, whether that might be improved (easily).db<>fiddle here – demonstrating all of the aboveIndexesIs there anything I could improve on the tables/indexes?This index is completely useless:CREATE INDEX ON "test" ("id");The PK on test.id is implemented with a unique index on the column, that already covers everything the additional index might do for you.There may be more, waiting for the question to clear up.Distorted test caseThe test case is too far away from actual use case to be meaningful.In the test setup, each table has 100k rows, there is no guarantee that every value in joincol has a match on the other side, and both columns can be NULLYour real case has 10M rows in table1 and < 100 rows in table2, every value in table1.joincol has a match in table2.joincol, both are defined NOT NULL, and table2.joincol is unique. A classical one-to-many relationship. There should be a UNIQUE constraint on table2.joincol and a FK constraint t1.joincol --> t2.joincol.But that’s currently all twisted in the question. Standing by till that’s cleaned up.

Performance impact of view on aggregate function vs result set limiting

The problem

What I found out

My real environment

The actual question

Advertisement

Answer

Queries not strictly equivalent

Indexes

Distorted test case