The query below is showing some duplicate and wrong values:
SELECT c.contest_id, c.hacker_id, c.name, SUM(s.total_submissions) as total_submissions, SUM(s.total_accepted_submissions) as total_accepted_submissions, SUM(v.total_views) as total_views, SUM(v.total_unique_views) as total_unique_views FROM concursos c JOIN faculdades f ON f.contest_id = c.contest_id JOIN desafios d ON d.college_id = f.college_id LEFT JOIN view_stats v ON v.challenge_id = d.challenge_id LEFT JOIN submission_stats s ON s.challenge_id = d.challenge_id GROUP BY c.contest_id;
The output should look like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views | +------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+ | 66406 | 17973 | Rose | 111 | 39 | 156 | 56 | | 66556 | 79153 | Angela | 0 | 0 | 11 | 10 | | 94828 | 80275 | Frank | 150 | 38 | 41 | 15
But it’s coming out like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views | +------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+ | 66406 | 17973 | Rose | 222 | 78 | 238 | 122 | | 66556 | 79153 | Angela | NULL | NULL | 11 | 10 | | 94828 | 80275 | Frank | 150 | 38 | 82 | 30
Tables schema:
Advertisement
Answer
What happens
The problem is that view_stats and submission_stats have multiple rows per challenge_id.
The JOINs in the query happen before the GROUP BY and the SUM. So imagine, the result set of your query without GROUP BY and SUM.
A simplified example would be:
ids table: id -- 1 x table: id|vx ------ 1|11 1|22 y table: id|vy ------ 1| 1
The result of
SELECT ids.id, x.vx, y.vy FROM ids LEFT JOIN x on x.id = ids.id LEFT JOIN y on y.id = ids.id;
would be
| id | vx | vy | | --- | --- | --- | | 1 | 11 | 1 | | 1 | 22 | 1 |
Mind the duplicate 1 in the vy column, although, in the original y table there is only one row. This happens because, for id=1 there are two rows in table x. These are joined first, thereby also duplicating the rows of the ids table. Then y is joined to these already duplicated rows which duplicates the rows of y too. When SUM‘ing and grouping, we end up with.
| id | SUM(vy) | | --- | ------- | | 1 | 2 |
You can find a dbfiddle with the simplified example to play around here.
Solution
There are multiple ways to solve this. The most intuitive is to GROUP and SUM the rows of view_stats and submission_stats before joining them.
SELECT c.contest_id, c.hacker_id, c.name,
SUM(s.total_submissions) as total_submissions,
SUM(s.total_accepted_submissions) as total_accepted_submissions,
SUM(v.total_views) as total_views,
SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN (
SELECT
challenge_id,
SUM(total_views) as total_views,
SUM(total_unique_views) as total_unique_views
FROM view_stats
GROUP BY challenge_id
) v ON v.challenge_id = d.challenge_id
LEFT JOIN (
SELECT
challenge_id,
SUM(total_submissions) as total_submissions,
SUM(total_accepted_submissions) as total_accepted_submissions
FROM submission_stats
GROUP BY challenge_id
) s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id
# to output only rows with non zero sums
HAVING
IFNULL(SUM(s.total_submissions), 0) <> 0
OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0
OR IFNULL(SUM(v.total_views), 0) <> 0
OR IFNULL(SUM(v.total_unique_views), 0) <> 0;
