Skip to content
Advertisement

Sum unequal and removing duplicates from SQL query results

My base query:

SELECT project_id
       name
       stories_produced
       on_date
      FROM project_prod
      WHERE on_date IN ('2017-03-01', '2017-06-10')
      ORDER BY project_id

It can get me these outputs:

Output example:

id       name                   stories_produced        on_date
1042    project 1               1001 (wanted)           2017-03-01
1042    project 1               1801 (wanted)           2017-06-10
1568    project 2               355  (wanted)           2017-06-10
1405    project 3               1    (not wanted)       2017-03-10
1405    project 3               1    (not wanted)       2017-06-10

Obs: There is a constraint on (id, on_date) meaning there can always be only one record of a project production on a specific date.

Duplicate records, that have the same id, and exist in both dates and have different production values (wanted)

Single records, that exists on only one of the dates (wanted)

The problem:*

Duplicate records, that have the same id, and exist in both dates and have equal production values (not wanted)

My current query, that need change

      select project_id 
        name 
        CASE
          WHEN max(stories_produced) - min(stories_produced) = 0    
          THEN max(stories_produced)
          ELSE max(stories_produced) - min(stories_produced)
        END AS 'stories_produced'
        from project_prod
      WHERE on_date IN ('2017-03-01', '2017-06-10')
      group by project_id;

output example:

id       name                   stories_produced        
1042    project 1               800 (wanted)
1568    project 2               355  (wanted)
1405    project 3               1    (not wanted)

The CASE is currently not taking care of the third constraint (Duplicate records, that have the same id, and exist in both dates and have EQUAL production values (not wanted))

Is there any possible condition that can accommodate this?

Advertisement

Answer

One option uses not exists to drop rows that have the same id, and exist in both dates and have equal production values:

select
    p.project_id,
    p.name,
    p.stories_produced,
    p.on_date,
  from project_prod p
  where 
    on_date in ('2017-03-01', '2017-06-10')
    and not exists (
        select 1
        from project_prod p1
        where 
            p1.on_date in ('2017-03-01', '2017-06-10')
            and p1.on_date <> p.date
            and p1.id = p.id 
            and p1.stories_produced = p.stories_produced

    )
  order by project_id 

In MySQL 8.0, you can use window functions:

select
    project_id,
    name,
    stories_produced,
    on_date,
from (
    select 
        p.*,
        min(stories_produced) over(partition by project_id) min_stories_produced,
        max(stories_produced) over(partition by project_id) max_stories_produced,
        count(*) over(partition by project_id) max_stories_produced cnt
    from project_prod p
    where on_date in ('2017-03-01', '2017-06-10')
) t
where not (cnt = 2 and min_stories_produced = max_stories_produced)
oder by project_id
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement