The query below is showing some duplicate and wrong values:
SELECT c.contest_id, c.hacker_id, c.name, SUM(s.total_submissions) as total_submissions, SUM(s.total_accepted_submissions) as total_accepted_submissions, SUM(v.total_views) as total_views, SUM(v.total_unique_views) as total_unique_views FROM concursos c JOIN faculdades f ON f.contest_id = c.contest_id JOIN desafios d ON d.college_id = f.college_id LEFT JOIN view_stats v ON v.challenge_id = d.challenge_id LEFT JOIN submission_stats s ON s.challenge_id = d.challenge_id GROUP BY c.contest_id;
The output should look like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views | +------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+ | 66406 | 17973 | Rose | 111 | 39 | 156 | 56 | | 66556 | 79153 | Angela | 0 | 0 | 11 | 10 | | 94828 | 80275 | Frank | 150 | 38 | 41 | 15
But it’s coming out like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views | +------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+ | 66406 | 17973 | Rose | 222 | 78 | 238 | 122 | | 66556 | 79153 | Angela | NULL | NULL | 11 | 10 | | 94828 | 80275 | Frank | 150 | 38 | 82 | 30
Tables schema:
Advertisement
Answer
What happens
The problem is that view_stats
and submission_stats
have multiple rows per challenge_id
.
The JOIN
s in the query happen before the GROUP BY
and the SUM
. So imagine, the result set of your query without GROUP BY
and SUM
.
A simplified example would be:
ids table: id -- 1 x table: id|vx ------ 1|11 1|22 y table: id|vy ------ 1| 1
The result of
SELECT ids.id, x.vx, y.vy FROM ids LEFT JOIN x on x.id = ids.id LEFT JOIN y on y.id = ids.id;
would be
| id | vx | vy | | --- | --- | --- | | 1 | 11 | 1 | | 1 | 22 | 1 |
Mind the duplicate 1
in the vy
column, although, in the original y
table there is only one row. This happens because, for id=1
there are two rows in table x
. These are joined first, thereby also duplicating the rows of the ids
table. Then y
is joined to these already duplicated rows which duplicates the rows of y
too. When SUM
‘ing and grouping, we end up with.
| id | SUM(vy) | | --- | ------- | | 1 | 2 |
You can find a dbfiddle with the simplified example to play around here.
Solution
There are multiple ways to solve this. The most intuitive is to GROUP
and SUM
the rows of view_stats
and submission_stats
before joining them.
SELECT c.contest_id, c.hacker_id, c.name, SUM(s.total_submissions) as total_submissions, SUM(s.total_accepted_submissions) as total_accepted_submissions, SUM(v.total_views) as total_views, SUM(v.total_unique_views) as total_unique_views FROM concursos c JOIN faculdades f ON f.contest_id = c.contest_id JOIN desafios d ON d.college_id = f.college_id LEFT JOIN ( SELECT challenge_id, SUM(total_views) as total_views, SUM(total_unique_views) as total_unique_views FROM view_stats GROUP BY challenge_id ) v ON v.challenge_id = d.challenge_id LEFT JOIN ( SELECT challenge_id, SUM(total_submissions) as total_submissions, SUM(total_accepted_submissions) as total_accepted_submissions FROM submission_stats GROUP BY challenge_id ) s ON s.challenge_id = d.challenge_id GROUP BY c.contest_id # to output only rows with non zero sums HAVING IFNULL(SUM(s.total_submissions), 0) <> 0 OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0 OR IFNULL(SUM(v.total_views), 0) <> 0 OR IFNULL(SUM(v.total_unique_views), 0) <> 0;