Skip to content
Advertisement

Filtering Duplicates after UNION

I have 2 tables: Activities & Edited they are pretty the same but Edited contains a few additional columns that allow me to process data offline. The record may be at one of the tables or in both.

I need to take a bunch of data that will contain only a unique activityKey. The latest result should be based on editDate. For example:

SELECT activityKey, editDate, deleted
FROM activities
WHERE projectID = '4'  
AND deleted = 0
UNION
SELECT activityKey, editDate, deleted
FROM edited
WHERE projectID = '4' 
ORDER BY editDate DESC
LIMIT 20 OFFSET 0

enter image description here

But It only collects the data from both tables. And I get duplications like in the screenshot: WO-981 & WO-924. Question: How I can remove these duplicates and use only the newest records based on editDate. Also, I need to remove the record if deleted = 1 at editDate and deleted = 0 at activities table as with WO-924 it should be removed.

Advertisement

Answer

You can use aggregation for this, if deleted takes on only the values of 0 and 1:

SELECT activityKey, editDate,
       (CASE WHEN MAX(CASE WHEN deleted = 0 THEN editDate END) = MAX(editDate)
             THEN 0 ELSE 1
        END) as deleted
FROM (SELECT activityKey, editDate, deleted
      FROM activities
      WHERE projectID = '4' AND deleted = 0
      UNION ALL
      SELECT activityKey, editDate, deleted
      FROM edited
      WHERE projectID = '4' 
     ) ae
GROUP BY activityKey
ORDER BY MAX(editDate) DESC
LIMIT 20 OFFSET 0;

The more general solution is to use ROW_NUMBER(), but this works for this case.

EDIT:

If you want to skip rows with deleted = 1, then add the HAVING clause:

SELECT activityKey, editDate,
       (CASE WHEN MAX(CASE WHEN deleted = 0 THEN editDate END) = MAX(editDate)
             THEN 0 ELSE 1
        END) as deleted
FROM (SELECT activityKey, editDate, deleted
      FROM activities
      WHERE projectID = '4' AND deleted = 0
      UNION ALL
      SELECT activityKey, editDate, deleted
      FROM edited
      WHERE projectID = '4' 
     ) ae
GROUP BY activityKey
HAVING MAX(editDate) = MAX(CASE WHEN deleted = 1 THEN editDate END)
ORDER BY MAX(editDate) DESC
LIMIT 20 OFFSET 0;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement