The query below is showing some duplicate and wrong values:
SELECT c.contest_id, c.hacker_id, c.name,
SUM(s.total_submissions) as total_submissions,
SUM(s.total_accepted_submissions) as total_accepted_submissions,
SUM(v.total_views) as total_views,
SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN view_stats v ON v.challenge_id = d.challenge_id
LEFT JOIN submission_stats s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id;
The output should look like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
| 66406 | 17973 | Rose | 111 | 39 | 156 | 56 |
| 66556 | 79153 | Angela | 0 | 0 | 11 | 10 |
| 94828 | 80275 | Frank | 150 | 38 | 41 | 15
But it’s coming out like this:
contest_id | hacker_id | name | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
| 66406 | 17973 | Rose | 222 | 78 | 238 | 122 |
| 66556 | 79153 | Angela | NULL | NULL | 11 | 10 |
| 94828 | 80275 | Frank | 150 | 38 | 82 | 30
Tables schema:
Advertisement
Answer
What happens
The problem is that view_stats
and submission_stats
have multiple rows per challenge_id
.
The JOIN
s in the query happen before the GROUP BY
and the SUM
. So imagine, the result set of your query without GROUP BY
and SUM
.
A simplified example would be:
ids table:
id
--
1
x table:
id|vx
------
1|11
1|22
y table:
id|vy
------
1| 1
The result of
SELECT ids.id, x.vx, y.vy
FROM ids
LEFT JOIN x on x.id = ids.id
LEFT JOIN y on y.id = ids.id;
would be
| id | vx | vy |
| --- | --- | --- |
| 1 | 11 | 1 |
| 1 | 22 | 1 |
Mind the duplicate 1
in the vy
column, although, in the original y
table there is only one row. This happens because, for id=1
there are two rows in table x
. These are joined first, thereby also duplicating the rows of the ids
table. Then y
is joined to these already duplicated rows which duplicates the rows of y
too. When SUM
‘ing and grouping, we end up with.
| id | SUM(vy) |
| --- | ------- |
| 1 | 2 |
You can find a dbfiddle with the simplified example to play around here.
Solution
There are multiple ways to solve this. The most intuitive is to GROUP
and SUM
the rows of view_stats
and submission_stats
before joining them.
SELECT c.contest_id, c.hacker_id, c.name,
SUM(s.total_submissions) as total_submissions,
SUM(s.total_accepted_submissions) as total_accepted_submissions,
SUM(v.total_views) as total_views,
SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN (
SELECT
challenge_id,
SUM(total_views) as total_views,
SUM(total_unique_views) as total_unique_views
FROM view_stats
GROUP BY challenge_id
) v ON v.challenge_id = d.challenge_id
LEFT JOIN (
SELECT
challenge_id,
SUM(total_submissions) as total_submissions,
SUM(total_accepted_submissions) as total_accepted_submissions
FROM submission_stats
GROUP BY challenge_id
) s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id
# to output only rows with non zero sums
HAVING
IFNULL(SUM(s.total_submissions), 0) <> 0
OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0
OR IFNULL(SUM(v.total_views), 0) <> 0
OR IFNULL(SUM(v.total_unique_views), 0) <> 0;