Tables:
Posts Comments Likes Users ID | User_id ID | Post_id ID | Post_id ID | Name ------- -------- -------- -------- 1 | 101 1 | 1 1 | 3 101 | 'Michael' 2 | 101 2 | 3 2 | 3 202 | 'Daniel' 3 | 303 3 | 3 3 | 3 303 | 'Scott'
What I am expecting to query is this:
ID | Total_comments | Total_likes | Name 1 | 1 | 0 | 'Michael' 2 | 0 | 0 | 'Michael' 3 | 2 | 3 | 'Scott'
However this is what I get:
ID | Total_comments | Total_likes | Name 1 | 1 | 0 | 'Michael' 2 | 0 | 0 | 'Michael' 3 | 6 | 6 | 'Scott'
With this query:
SELECT Posts.ID, COUNT(Comments.Post_id) as Total_comments, COUNT(Likes.Post_id) as Total_likes, Users.Name FROM Posts INNER JOIN Users ON Users.ID = Posts.User_id LEFT JOIN Comments ON Comments.Post_id = Posts.ID LEFT JOIN Likes ON Likes.Post_id = Posts.ID GROUP BY Posts.ID
I wonder, in what way the query above could be modified to return the expected results? What would be a good approach to achieve this?
Advertisement
Answer
You are joining along two dimensions, so you are getting a Cartesian products. That is simply what happens.
A hacky solution — which works quite well if there are not too many comments and likes for a given post — is to use COUNT(DISTINCT)
:
SELECT Posts.ID, COUNT(DISTINCT Comments.Post_id) as Total_comments, COUNT(DISTINCT Likes.Post_id) as Total_likes, Users.Name
The most efficient solution, though, is probably correlated subqueries:
SELECT p.ID, (SELECT COUNT(*) FROM Comments c WHERE c.Post_Id = p.ID) as Total_comments, (SELECT COUNT(*) FROM Likes l WHERE l..Post_id = p.ID) as Total_likes, u.Name FROM Posts p INNER JOIN Users u ON u.ID = p.User_id ;
This is faster because it avoids the outer aggregation. However, it requires indexes on comments(post_id)
and likes(post_id)
.