My base query:
SELECT project_id name stories_produced on_date FROM project_prod WHERE on_date IN ('2017-03-01', '2017-06-10') ORDER BY project_id
It can get me these outputs:
Output example:
id name stories_produced on_date 1042 project 1 1001 (wanted) 2017-03-01 1042 project 1 1801 (wanted) 2017-06-10 1568 project 2 355 (wanted) 2017-06-10 1405 project 3 1 (not wanted) 2017-03-10 1405 project 3 1 (not wanted) 2017-06-10
Obs: There is a constraint on (id, on_date) meaning there can always be only one record of a project production on a specific date.
Duplicate records, that have the same id, and exist in both dates and have different production values (wanted)
Single records, that exists on only one of the dates (wanted)
The problem:*
Duplicate records, that have the same id, and exist in both dates and have equal production values (not wanted)
My current query, that need change
select project_id name CASE WHEN max(stories_produced) - min(stories_produced) = 0 THEN max(stories_produced) ELSE max(stories_produced) - min(stories_produced) END AS 'stories_produced' from project_prod WHERE on_date IN ('2017-03-01', '2017-06-10') group by project_id;
output example:
id name stories_produced 1042 project 1 800 (wanted) 1568 project 2 355 (wanted) 1405 project 3 1 (not wanted)
The CASE is currently not taking care of the third constraint (Duplicate records, that have the same id, and exist in both dates and have EQUAL production values (not wanted))
Is there any possible condition that can accommodate this?
Advertisement
Answer
One option uses not exists
to drop rows that have the same id, and exist in both dates and have equal production values:
select p.project_id, p.name, p.stories_produced, p.on_date, from project_prod p where on_date in ('2017-03-01', '2017-06-10') and not exists ( select 1 from project_prod p1 where p1.on_date in ('2017-03-01', '2017-06-10') and p1.on_date <> p.date and p1.id = p.id and p1.stories_produced = p.stories_produced ) order by project_id
In MySQL 8.0, you can use window functions:
select project_id, name, stories_produced, on_date, from ( select p.*, min(stories_produced) over(partition by project_id) min_stories_produced, max(stories_produced) over(partition by project_id) max_stories_produced, count(*) over(partition by project_id) max_stories_produced cnt from project_prod p where on_date in ('2017-03-01', '2017-06-10') ) t where not (cnt = 2 and min_stories_produced = max_stories_produced) oder by project_id