Query returns duplicate values [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 1 year ago. Improve this question The query below is showing some duplicate and wrong values: The output should look like this: But it's coming out like this: Tables schema: Answer

Accepted Answer

What happensThe problem is that view_stats and submission_stats have multiple rows per challenge_id.The JOINs in the query happen before the GROUP BY and the SUM. So imagine, the result set of your query without GROUP BY and SUM.A simplified example would be:ids table:id-- 1x table:id|vx------ 1|11 1|22y table:id|vy------ 1| 1The result ofSELECT ids.id, x.vx, y.vyFROM idsLEFT JOIN x on x.id = ids.idLEFT JOIN y on y.id = ids.id;would be| id | vx | vy || --- | --- | --- || 1 | 11 | 1 || 1 | 22 | 1 |Mind the duplicate 1 in the vy column, although, in the original y table there is only one row. This happens because, for id=1 there are two rows in table x. These are joined first, thereby also duplicating the rows of the ids table. Then y is joined to these already duplicated rows which duplicates the rows of y too. When SUM‘ing and grouping, we end up with.| id | SUM(vy) || --- | ------- || 1 | 2 |You can find a dbfiddle with the simplified example to play around here.SolutionThere are multiple ways to solve this. The most intuitive is to GROUP and SUM the rows of view_stats and submission_stats before joining them.SELECT c.contest_id, c.hacker_id, c.name, SUM(s.total_submissions) as total_submissions, SUM(s.total_accepted_submissions) as total_accepted_submissions, SUM(v.total_views) as total_views, SUM(v.total_unique_views) as total_unique_viewsFROM concursos cJOIN faculdades f ON f.contest_id = c.contest_idJOIN desafios d ON d.college_id = f.college_idLEFT JOIN ( SELECT challenge_id, SUM(total_views) as total_views, SUM(total_unique_views) as total_unique_views FROM view_stats GROUP BY challenge_id) v ON v.challenge_id = d.challenge_idLEFT JOIN ( SELECT challenge_id, SUM(total_submissions) as total_submissions, SUM(total_accepted_submissions) as total_accepted_submissions FROM submission_stats GROUP BY challenge_id) s ON s.challenge_id = d.challenge_idGROUP BY c.contest_id# to output only rows with non zero sumsHAVING IFNULL(SUM(s.total_submissions), 0) <> 0 OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0 OR IFNULL(SUM(v.total_views), 0) <> 0 OR IFNULL(SUM(v.total_unique_views), 0) <> 0;

Advertisement

Answer

What happens

Solution