Skip to content
Advertisement

Compare the values in the successive rows and display newly added or deleted items in a different columns

I have a database with date and list of items. The problem this I do have an event date captured whenever an item is added or deleted. So what I am trying to do is comparing the items for a user ‘A’ on different days (successive rows); here the newly added or deleted items for user ‘A’ should be displayed in new columns ‘Items added’ and ‘Items Removed’

   Date   |         Items List                              
-------------------------------------------------------------
24th July | Item A, Item B, Item C, Item D                                     
27th July | Item A, Item B, Item C, Item D, Item E, Item F   
28th July | Item A, Item D, Item E, Item F, Item G         

Expected output for user ‘A’:

   Date   |         Items List                             |   Items Added      |  Items Removed 
-----------------------------------------------------------------------------------------
24th July | Item A, Item B, Item C, Item D                 |                    |
27th July | Item A, Item B, Item C, Item D, Item E, Item F |    Item E, Item F  |    
28th July | Item A, Item D, Item E, Item F, Item G         |      Item G        |   Item B, Item C

I’d really appreciate if someone help me do this. Thanks in advance

Advertisement

Answer

Although this is a really bad data model, it is something of a fun question. Because Postgres was one of the original tags, there is a Postgres version:

select date, items,
       string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items,
       string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items
from (select t.*, item,
             lag(date) over (partition by item order by date) as prev_item_date,
             lead(date) over (partition by item order by date) as next_item_date
      from (select t.*, 
                   lag(date) over (order by date) as prev_date,
                   lead(date) over (order by date) as next_date
            from t
           ) t cross join
           regexp_split_to_table(t.items, ', ') item
     ) ti
group by date, items;

Here is a db<>fiddle.

In BigQuery, the logic is basically the same, just some nuances for unnesting the string:

select date, items,
       string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items,
       string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items
from (select t.*, item,
             lag(date) over (partition by item order by date) as prev_item_date,
             lead(date) over (partition by item order by date) as next_item_date
      from (select t.*, 
                   lag(date) over (order by date) as prev_date,
                   lead(date) over (order by date) as next_date
            from t
           ) t cross join
           unnest(regexp_extract_all(replace(t.items, ', ', ','), ',')) item
     ) ti
group by date, items;
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement