Since Postgres came out with the ability to do LATERAL
joins, I’ve been reading up on it, since I currently do complex data dumps for my team with lots of inefficient subqueries that make the overall query take four minutes or more.
I understand that LATERAL
joins may be able to help me, but even after reading articles like this one from Heap Analytics, I still don’t quite follow.
What is the use case for a LATERAL
join? What is the difference between a LATERAL
join and a subquery?
Advertisement
Answer
What is a LATERAL
join?
The feature was introduced with PostgreSQL 9.3.
Quoting the manual:
Subqueries appearing in
FROM
can be preceded by the key wordLATERAL
. This allows them to reference columns provided by precedingFROM
items. (WithoutLATERAL
, each subquery is evaluated independently and so cannot cross-reference any otherFROM
item.)Table functions appearing in
FROM
can also be preceded by the key wordLATERAL
, but for functions the key word is optional; the function’s arguments can contain references to columns provided by precedingFROM
items in any case.
Basic code examples are given there.
More like a correlated subquery
A LATERAL
join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL
join are evaluated once for each row left of it – just like a correlated subquery – while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)
Related answer with code examples for both side by side, solving the same problem:
For returning more than one column, a LATERAL
join is typically simpler, cleaner and faster.
Also, remember that the equivalent of a correlated subquery is LEFT JOIN LATERAL ... ON true
:
Things a subquery can’t do
There are things that a LATERAL
join can do, but a (correlated) subquery cannot (easily). A correlated subquery can only return a single value, not multiple columns and not multiple rows – with the exception of bare function calls (which multiply result rows if they return multiple rows). But even certain set‑returning functions are only allowed in the FROM
clause. Like unnest()
with multiple parameters in Postgres 9.4 or later. The manual:
This is only allowed in the
FROM
clause;
So this works, but cannot (easily) be replaced with a subquery:
CREATE TABLE tbl (a1 int[], a2 int[]); SELECT * FROM tbl, unnest(a1, a2) u(elem1, elem2); -- implicit LATERAL
The comma (,
) in the FROM
clause is short notation for CROSS JOIN
.
LATERAL
is assumed automatically for table functions.
About the special case of UNNEST( array_expression [, ... ] )
:
Set-returning functions in the SELECT
list
You can also use set-returning functions like unnest()
in the SELECT
list directly. This used to exhibit surprising behavior with more than one such function in the same SELECT
list up to Postgres 9.6. But it has finally been sanitized with Postgres 10 and is a valid alternative now (even if not standard SQL). See:
Building on above example:
SELECT *, unnest(a1) AS elem1, unnest(a2) AS elem2 FROM tbl;
Comparison:
dbfiddle for pg 9.6 here
dbfiddle for pg 10 here
Clarify misinformation
For the
INNER
andOUTER
join types, a join condition must be specified, namely exactly one ofNATURAL
,ON
join_condition, orUSING
(join_column [, …]). See below for the meaning.
ForCROSS JOIN
, none of these clauses can appear.
So these two queries are valid (even if not particularly useful):
SELECT * FROM tbl t LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t ON TRUE; SELECT * FROM tbl t, LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
While this one is not:
SELECT * FROM tbl t LEFT JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;
That’s why Andomar’s code example is correct (the CROSS JOIN
does not require a join condition) and Attila’s is was not.