Let’s say I had a table as follows:
ID TIMESTAMP GROUP 001 2021-04-01 12:51:12.063 AppleA 001 2021-04-04 12:51:12.063 Apple-Direct 001 2021-04-14 10:47:03.022 AppleA 002 2021-01-13 09:46:23.012 BananaA 003 2021-09-10 03:32:53.043 Banana-Direct 004 2021-04-13 01:12:54.056 Grape-Direct 004 2021-04-13 11:12:26.054 AppleA 004 2021-04-13 21:53:36.023 GrapeA 005 2021-04-01 13:53:13.023 BananaO 005 2021-04-11 13:53:13.023 Banana-Direct 003 2022-04-13 20:32:11.011 Banana-Direct 006 2021-08-13 20:32:11.011 GrapeO 006 2021-08-13 20:32:11.011 GrapeA 007 2021-08-13 20:32:11.011 Grape-Direct 007 2021-08-13 20:32:11.011 BananaA
And what I want to do here is if an ID
has a group and the same ID
has the group-Direct, then the Direct row is removed.
So what It should look like is:
ID TIMESTAMP GROUP 001 2021-04-01 12:51:12.063 AppleA 001 2021-04-14 10:47:03.022 AppleA 002 2021-01-13 09:46:23.012 BananaA 003 2021-09-10 03:32:53.043 Banana-Direct 004 2021-04-13 01:12:54.056 Grape-Direct 004 2021-04-13 11:12:26.054 AppleA 004 2021-04-13 21:53:36.023 GrapeA 005 2021-04-01 13:53:13.023 BananaO 003 2022-04-13 20:32:11.011 Banana-Direct 006 2021-08-13 20:32:11.011 GrapeO 006 2021-08-13 20:32:11.011 GrapeA 007 2021-08-13 20:32:11.011 Grape-Direct 007 2021-08-13 20:32:11.011 BananaA
What happened?
The only ones to change were ID = 001
and ID = 005
.
ID = 001
Here the group values for this ID are only AppleA and Apple-Direct. They both have Apple. And because these are the only options, we remove the Apple-Direct
because given the rules, its not necessary.
ID = 005 Same goes for ID 005, where the only options were BananaO and Banana-Direct. Thus we remove the direct one.
In Case002, there is only one option so we keep it.
In Case 003, both options are Banana-Direct, so 1 unique, we keep it.
In case 004, We have 3 distinct so we keep it.
In Case 006, we have two distinct but none are direct, so we keep both.
In Case 006, we have two distinct,1 is direct but the other is not a match. (Grape and Banana). So even though there is a Direct and even though there are 2 distinct, we keep both bc the string did not match, so we keep both.
So in conclusion:
- 1 Distinct : Keep All
- ( > 2 Distinct): Keep All
- 2 Distinct: If 1 is Direct AND they match the string, we toss the ones that has direct.
- 2 Distinct: If 1 is Direct AND the other doesnt match the string, we keep all.
- 2 Distinct: If none are Direct we keep both.
The combinations are always Apple Banana Grape so
AppleA Apple-Direct AppleO Apple-Direct GrapeA Grape-Direct GrapeO Grape-Direct BananaA Banana-Direct BananaO Banana-Direct
Advertisement
Answer
So you want to group by ID, and if there is a token that does not end with ‘-Direct’ AND there is also the same token ending with -Direct
, the direct token is dropped.
So that requires a DELETE command that is matching sub clauses.
So the sub-clause want to look like:
SELECT a.ID ,a.GROUP FROM table AS a JOIN table AS b ON a.ID = b.ID AND a.group = b.group || '-Direct' AND a.group LIKE '%-Direct' AND b.group NOT LIKE '%-Direct'
Thus the DELETE becomes:
DELETE FROM table AS d using ( SELECT a.ID ,a.GROUP FROM table AS a JOIN table AS b ON a.ID = b.ID AND a.group = b.group || '-Direct' AND a.group LIKE '%-Direct' AND b.group NOT LIKE '%-Direct' ) AS m WHERE d.id = m.id AND d.GROUP = m.GROUP;
I am think this can be rewriten to be simpler, but without running it, I think it should show you how to achieve the task.