Skip to content
Advertisement

Count rows after joining three tables in PostgreSQL

Suppose I have three tables in PostgreSQL:

table1 - id1, a_id, updated_by_id
table2 - id2, a_id, updated_by_id
Users  - id, display_name

Suppose I am using the using the following query:

select count(t1.id1) from table1 t1 
left join table2 t2 on (t1.a_id=t2.a_id) 
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id) 
where u1.id=100;

I get 50 as count.

Whereas with:

select count(t1.id1) from table1 t1 
left join table2 t2 on (t1.a_id=t2.a_id) 
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id) 
where u2.id=100;

I get only 25 as count.

What is my mistake in the second query? What can I do to get the same count?

My requirement is that there is a single user table, referenced by multiple tables. I want to take the complete list of users and get the count of ids from different tables.

But the table on which I have joined alone returns the proper count but rest of them don’t return the proper count. Can anybody suggest a way to modify my second query to get the proper count?

Advertisement

Answer

To simplify your logic, aggregate first, join later.

Guessing missing details, this query would give you the exact count, how many times each user was referenced in table1 and table2 respectively for all users:

SELECT *
FROM   users u
LEFT   JOIN (
   SELECT updated_by_id AS id, count(*) AS t1_ct
   FROM   table1
   GROUP  BY 1
   ) t1 USING (id)
LEFT   JOIN (
   SELECT updated_by_id AS id, count(*) AS t2_ct
   FROM   table2
   GROUP  BY 1
   ) t2 USING (id);

In particular, avoid multiple 1-n relationships multiplying each other when joined together:

To retrieve a single or few users only, LATERAL joins will be faster (Postgres 9.3+):

SELECT *
FROM   users u
LEFT   JOIN  LATERAL (
   SELECT count(*) AS t1_ct
   FROM   table1
   WHERE  updated_by_id = u.id
   ) ON true
LEFT   JOIN  LATERAL (
   SELECT count(*) AS t2_ct
   FROM   table2
   WHERE  updated_by_id = u.id
   ) ON true
WHERE  u.id = 100;

Explain perceived difference

The particular mismatch you report is due to the specifics of a FULL OUTER JOIN:

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

So you get NULL values appended on the respective other side for missing matches. count() does not count NULL values. So you can get a different result depending on whether you filter on u1.id=100 or u2.id=100.

This is just to explain, you don’t need a FULL JOIN here. Use the presented alternatives instead.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement