I have 2 tables: Activities
& Edited
they are pretty the same but Edited
contains a few additional columns that allow me to process data offline. The record may be at one of the tables or in both.
I need to take a bunch of data that will contain only a unique activityKey
. The latest result should be based on editDate
. For example:
SELECT activityKey, editDate, deleted
FROM activities
WHERE projectID = '4'
AND deleted = 0
UNION
SELECT activityKey, editDate, deleted
FROM edited
WHERE projectID = '4'
ORDER BY editDate DESC
LIMIT 20 OFFSET 0
But It only collects the data from both tables. And I get duplications like in the screenshot: WO-981
& WO-924
.
Question:
How I can remove these duplicates and use only the newest records based on editDate
.
Also, I need to remove the record if deleted = 1
at editDate
and deleted = 0
at activities
table as with WO-924
it should be removed.
Advertisement
Answer
You can use aggregation for this, if deleted
takes on only the values of 0 and 1:
SELECT activityKey, editDate,
(CASE WHEN MAX(CASE WHEN deleted = 0 THEN editDate END) = MAX(editDate)
THEN 0 ELSE 1
END) as deleted
FROM (SELECT activityKey, editDate, deleted
FROM activities
WHERE projectID = '4' AND deleted = 0
UNION ALL
SELECT activityKey, editDate, deleted
FROM edited
WHERE projectID = '4'
) ae
GROUP BY activityKey
ORDER BY MAX(editDate) DESC
LIMIT 20 OFFSET 0;
The more general solution is to use ROW_NUMBER()
, but this works for this case.
EDIT:
If you want to skip rows with deleted
= 1, then add the HAVING
clause:
SELECT activityKey, editDate,
(CASE WHEN MAX(CASE WHEN deleted = 0 THEN editDate END) = MAX(editDate)
THEN 0 ELSE 1
END) as deleted
FROM (SELECT activityKey, editDate, deleted
FROM activities
WHERE projectID = '4' AND deleted = 0
UNION ALL
SELECT activityKey, editDate, deleted
FROM edited
WHERE projectID = '4'
) ae
GROUP BY activityKey
HAVING MAX(editDate) = MAX(CASE WHEN deleted = 1 THEN editDate END)
ORDER BY MAX(editDate) DESC
LIMIT 20 OFFSET 0;