I have these records in the table – employee_projects
id | employee_id | project_id | status |
---|---|---|---|
1 | emp1 | proj1 | VERIFIED |
2 | emp2 | proj2 | REJECTED |
3 | emp1 | proj1 | VERIFIED |
4 | emp1 | proj3 | REJECTED |
5 | emp2 | proj2 | REQUIRED |
6 | emp3 | proj4 | SUBMITTED |
7 | emp4 | proj5 | VERIFIED |
8 | emp4 | proj6 | VERIFIED |
9 | emp3 | proj4 | REQUIRED |
Here are the criteria for determining duplicates:
- Same employee ID, same project ID under the same status (Example: rows 1 and 3 are duplicates)
- Same employee ID, same project ID but in different status (Example: rows 6 and 9 are duplicates). An exception to duplication criteria#2 is if one project is REQUIRED and the same project is also REJECTED under the same employee, this is NOT considered a duplicate. For example, rows 2 and 5 are NOT duplicates.
I have a query for the first criterion:
select emp_id, proj_id, status, COUNT(*) from employee_projects group by emp_id, proj_id, status having COUNT(*) > 1
What I’m struggling to construct is the SQL for the second criterion.
Advertisement
Answer
maybe a self join can help you.
with t (employee_id ,project_id,status) as ( select 'emp1', 'proj1' , 'VERIFIED' Union all select 'emp2', 'proj2' , 'REJECTED' Union all select 'emp1', 'proj1' , 'VERIFIED' Union all select 'emp1', 'proj3' , 'REJECTED' Union all select 'emp2', 'proj2' , 'REQUIRED' Union all select 'emp3', 'proj4' , 'SUBMITTED' Union all select 'emp4', 'proj5' , 'VERIFIED' Union all select 'emp4', 'proj6' , 'VERIFIED' Union all select 'emp3', 'proj4' , 'REQUIRED' ) select t.employee_id, t.project_id, t.status, '' as status, 'criteria#1' as SQL from t group by t.employee_id, t.project_id, t.status having COUNT(*) > 1 union all SELECT t.employee_id, t.project_id, t.status, a.status, 'criteria#2' as SQL FROM t left join t as a on t.employee_id = a.employee_id and t.project_id = a.project_id where t.status != a.status and concat(t.status,a.status) != 'REQUIREDREJECTED' and concat(t.status,a.status) != 'REJECTEDREQUIRED'