Skip to content
Advertisement

Query returns duplicate values [closed]

The query below is showing some duplicate and wrong values:

SELECT c.contest_id, c.hacker_id, c.name,
SUM(s.total_submissions) as total_submissions,
SUM(s.total_accepted_submissions) as total_accepted_submissions,
SUM(v.total_views) as total_views,
SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN view_stats v ON v.challenge_id = d.challenge_id
LEFT JOIN submission_stats s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id;

The output should look like this:

contest_id | hacker_id | name   | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
|      66406 |     17973 | Rose   |               111 |                         39 |         156 |                56 |
|      66556 |     79153 | Angela |                 0 |                          0 |          11 |                 10 |
|      94828 |     80275 | Frank  |               150 |                         38 |          41 |                15

But it’s coming out like this:

contest_id | hacker_id | name   | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
|      66406 |     17973 | Rose   |               222 |                         78 |         238 |                122 |
|      66556 |     79153 | Angela |              NULL |                       NULL |          11 |                 10 |
|      94828 |     80275 | Frank  |               150 |                         38 |          82 |                 30

Tables schema:

enter image description here

Advertisement

Answer

What happens

The problem is that view_stats and submission_stats have multiple rows per challenge_id.

The JOINs in the query happen before the GROUP BY and the SUM. So imagine, the result set of your query without GROUP BY and SUM.

A simplified example would be:

ids table:

id
--
 1


x table:

id|vx
------
 1|11
 1|22


y table:

id|vy
------
 1| 1

The result of

SELECT ids.id, x.vx, y.vy
FROM ids
LEFT JOIN x on x.id = ids.id
LEFT JOIN y on y.id = ids.id;

would be

| id  | vx  | vy  |
| --- | --- | --- |
| 1   | 11  | 1   |
| 1   | 22  | 1   |

Mind the duplicate 1 in the vy column, although, in the original y table there is only one row. This happens because, for id=1 there are two rows in table x. These are joined first, thereby also duplicating the rows of the ids table. Then y is joined to these already duplicated rows which duplicates the rows of y too. When SUM‘ing and grouping, we end up with.

| id  | SUM(vy) |
| --- | ------- |
| 1   |       2 |

You can find a dbfiddle with the simplified example to play around here.

Solution

There are multiple ways to solve this. The most intuitive is to GROUP and SUM the rows of view_stats and submission_stats before joining them.

SELECT c.contest_id, c.hacker_id, c.name,
   SUM(s.total_submissions) as total_submissions,
   SUM(s.total_accepted_submissions) as total_accepted_submissions,
   SUM(v.total_views) as total_views,
   SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN (
   SELECT 
      challenge_id, 
      SUM(total_views) as total_views,
      SUM(total_unique_views) as total_unique_views
   FROM view_stats
   GROUP BY challenge_id
) v ON v.challenge_id = d.challenge_id
LEFT JOIN (
   SELECT 
      challenge_id,
      SUM(total_submissions) as total_submissions,
      SUM(total_accepted_submissions) as total_accepted_submissions
   FROM submission_stats
   GROUP BY challenge_id
) s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id
# to output only rows with non zero sums
HAVING
   IFNULL(SUM(s.total_submissions), 0) <> 0
   OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0
   OR IFNULL(SUM(v.total_views), 0) <> 0
   OR IFNULL(SUM(v.total_unique_views), 0) <> 0;
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement