Suppose I have three tables in PostgreSQL:
table1 - id1, a_id, updated_by_id table2 - id2, a_id, updated_by_id Users - id, display_name
Suppose I am using the using the following query:
select count(t1.id1) from table1 t1 left join table2 t2 on (t1.a_id=t2.a_id) full outer join users u1 t1.updated_by_id=u1.id) full outer join users u2 t2.updated_by_id=u2.id) where u1.id=100;
I get 50
as count.
Whereas with:
select count(t1.id1) from table1 t1 left join table2 t2 on (t1.a_id=t2.a_id) full outer join users u1 t1.updated_by_id=u1.id) full outer join users u2 t2.updated_by_id=u2.id) where u2.id=100;
I get only 25
as count.
What is my mistake in the second query? What can I do to get the same count?
My requirement is that there is a single user table, referenced by multiple tables. I want to take the complete list of users and get the count of ids from different tables.
But the table on which I have joined alone returns the proper count but rest of them don’t return the proper count. Can anybody suggest a way to modify my second query to get the proper count?
Advertisement
Answer
To simplify your logic, aggregate first, join later.
Guessing missing details, this query would give you the exact count, how many times each user was referenced in table1
and table2
respectively for all users:
SELECT * FROM users u LEFT JOIN ( SELECT updated_by_id AS id, count(*) AS t1_ct FROM table1 GROUP BY 1 ) t1 USING (id) LEFT JOIN ( SELECT updated_by_id AS id, count(*) AS t2_ct FROM table2 GROUP BY 1 ) t2 USING (id);
In particular, avoid multiple 1-n relationships multiplying each other when joined together:
To retrieve a single or few users only, LATERAL
joins will be faster (Postgres 9.3+):
SELECT * FROM users u LEFT JOIN LATERAL ( SELECT count(*) AS t1_ct FROM table1 WHERE updated_by_id = u.id ) ON true LEFT JOIN LATERAL ( SELECT count(*) AS t2_ct FROM table2 WHERE updated_by_id = u.id ) ON true WHERE u.id = 100;
Explain perceived difference
The particular mismatch you report is due to the specifics of a FULL OUTER JOIN
:
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.
So you get NULL values appended on the respective other side for missing matches. count()
does not count NULL values. So you can get a different result depending on whether you filter on u1.id=100
or u2.id=100
.
This is just to explain, you don’t need a FULL JOIN
here. Use the presented alternatives instead.