Tables:
x
Posts Comments Likes Users
ID | User_id ID | Post_id ID | Post_id ID | Name
------- -------- -------- --------
1 | 101 1 | 1 1 | 3 101 | 'Michael'
2 | 101 2 | 3 2 | 3 202 | 'Daniel'
3 | 303 3 | 3 3 | 3 303 | 'Scott'
What I am expecting to query is this:
ID | Total_comments | Total_likes | Name
1 | 1 | 0 | 'Michael'
2 | 0 | 0 | 'Michael'
3 | 2 | 3 | 'Scott'
However this is what I get:
ID | Total_comments | Total_likes | Name
1 | 1 | 0 | 'Michael'
2 | 0 | 0 | 'Michael'
3 | 6 | 6 | 'Scott'
With this query:
SELECT Posts.ID,
COUNT(Comments.Post_id) as Total_comments,
COUNT(Likes.Post_id) as Total_likes,
Users.Name
FROM Posts INNER JOIN Users
ON Users.ID = Posts.User_id
LEFT JOIN Comments
ON Comments.Post_id = Posts.ID
LEFT JOIN Likes
ON Likes.Post_id = Posts.ID
GROUP BY Posts.ID
I wonder, in what way the query above could be modified to return the expected results? What would be a good approach to achieve this?
Advertisement
Answer
You are joining along two dimensions, so you are getting a Cartesian products. That is simply what happens.
A hacky solution — which works quite well if there are not too many comments and likes for a given post — is to use COUNT(DISTINCT)
:
SELECT Posts.ID,
COUNT(DISTINCT Comments.Post_id) as Total_comments,
COUNT(DISTINCT Likes.Post_id) as Total_likes,
Users.Name
The most efficient solution, though, is probably correlated subqueries:
SELECT p.ID,
(SELECT COUNT(*) FROM Comments c WHERE c.Post_Id = p.ID) as Total_comments,
(SELECT COUNT(*) FROM Likes l WHERE l..Post_id = p.ID) as Total_likes,
u.Name
FROM Posts p INNER JOIN
Users u
ON u.ID = p.User_id ;
This is faster because it avoids the outer aggregation. However, it requires indexes on comments(post_id)
and likes(post_id)
.