Skip to content
Advertisement

Removing rows based on a string in Snowflake (SQL)

Let’s say I had a table as follows:

ID      TIMESTAMP                  GROUP
001     2021-04-01 12:51:12.063    AppleA    
001     2021-04-04 12:51:12.063    Apple-Direct
001     2021-04-14 10:47:03.022    AppleA
002     2021-01-13 09:46:23.012    BananaA
003     2021-09-10 03:32:53.043    Banana-Direct
004     2021-04-13 01:12:54.056    Grape-Direct
004     2021-04-13 11:12:26.054    AppleA
004     2021-04-13 21:53:36.023    GrapeA
005     2021-04-01 13:53:13.023    BananaO
005     2021-04-11 13:53:13.023    Banana-Direct
003     2022-04-13 20:32:11.011    Banana-Direct
006     2021-08-13 20:32:11.011    GrapeO
006     2021-08-13 20:32:11.011    GrapeA
007     2021-08-13 20:32:11.011    Grape-Direct
007     2021-08-13 20:32:11.011    BananaA

And what I want to do here is if an ID has a group and the same ID has the group-Direct, then the Direct row is removed.

So what It should look like is:

ID      TIMESTAMP                  GROUP
001     2021-04-01 12:51:12.063    AppleA    
001     2021-04-14 10:47:03.022    AppleA
002     2021-01-13 09:46:23.012    BananaA
003     2021-09-10 03:32:53.043    Banana-Direct
004     2021-04-13 01:12:54.056    Grape-Direct
004     2021-04-13 11:12:26.054    AppleA
004     2021-04-13 21:53:36.023    GrapeA
005     2021-04-01 13:53:13.023    BananaO
003     2022-04-13 20:32:11.011    Banana-Direct
006     2021-08-13 20:32:11.011    GrapeO
006     2021-08-13 20:32:11.011    GrapeA
007     2021-08-13 20:32:11.011    Grape-Direct
007     2021-08-13 20:32:11.011    BananaA

What happened? The only ones to change were ID = 001 and ID = 005.

ID = 001

Here the group values for this ID are only AppleA and Apple-Direct. They both have Apple. And because these are the only options, we remove the Apple-Direct because given the rules, its not necessary.

ID = 005 Same goes for ID 005, where the only options were BananaO and Banana-Direct. Thus we remove the direct one.

In Case002, there is only one option so we keep it.

In Case 003, both options are Banana-Direct, so 1 unique, we keep it.

In case 004, We have 3 distinct so we keep it.

In Case 006, we have two distinct but none are direct, so we keep both.

In Case 006, we have two distinct,1 is direct but the other is not a match. (Grape and Banana). So even though there is a Direct and even though there are 2 distinct, we keep both bc the string did not match, so we keep both.

So in conclusion:

  • 1 Distinct : Keep All
  • ( > 2 Distinct): Keep All
  • 2 Distinct: If 1 is Direct AND they match the string, we toss the ones that has direct.
  • 2 Distinct: If 1 is Direct AND the other doesnt match the string, we keep all.
  • 2 Distinct: If none are Direct we keep both.

The combinations are always Apple Banana Grape so

AppleA
Apple-Direct

AppleO
Apple-Direct

GrapeA
Grape-Direct

GrapeO
Grape-Direct

BananaA
Banana-Direct

BananaO
Banana-Direct

Advertisement

Answer

So you want to group by ID, and if there is a token that does not end with ‘-Direct’ AND there is also the same token ending with -Direct, the direct token is dropped.

So that requires a DELETE command that is matching sub clauses.

So the sub-clause want to look like:

SELECT a.ID
    ,a.GROUP
FROM table AS a
JOIN table AS b
    ON a.ID = b.ID 
        AND a.group = b.group || '-Direct'
        AND a.group LIKE '%-Direct'
        AND b.group NOT LIKE '%-Direct'

Thus the DELETE becomes:

DELETE FROM table AS d
    using (
    SELECT a.ID
    ,a.GROUP
FROM table AS a
JOIN table AS b
    ON a.ID = b.ID 
        AND a.group = b.group || '-Direct'
        AND a.group LIKE '%-Direct'
        AND b.group NOT LIKE '%-Direct'

) AS m
WHERE d.id = m.id AND d.GROUP = m.GROUP;

I am think this can be rewriten to be simpler, but without running it, I think it should show you how to achieve the task.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement