I have a database with date and list of items. The problem this I do have an event date captured whenever an item is added or deleted. So what I am trying to do is comparing the items for a user ‘A’ on different days (successive rows); here the newly added or deleted items for user ‘A’ should be displayed in new columns ‘Items added’ and ‘Items Removed’
x
Date | Items List
-------------------------------------------------------------
24th July | Item A, Item B, Item C, Item D
27th July | Item A, Item B, Item C, Item D, Item E, Item F
28th July | Item A, Item D, Item E, Item F, Item G
Expected output for user ‘A’:
Date | Items List | Items Added | Items Removed
-----------------------------------------------------------------------------------------
24th July | Item A, Item B, Item C, Item D | |
27th July | Item A, Item B, Item C, Item D, Item E, Item F | Item E, Item F |
28th July | Item A, Item D, Item E, Item F, Item G | Item G | Item B, Item C
I’d really appreciate if someone help me do this. Thanks in advance
Advertisement
Answer
Although this is a really bad data model, it is something of a fun question. Because Postgres was one of the original tags, there is a Postgres version:
select date, items,
string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items,
string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items
from (select t.*, item,
lag(date) over (partition by item order by date) as prev_item_date,
lead(date) over (partition by item order by date) as next_item_date
from (select t.*,
lag(date) over (order by date) as prev_date,
lead(date) over (order by date) as next_date
from t
) t cross join
regexp_split_to_table(t.items, ', ') item
) ti
group by date, items;
Here is a db<>fiddle.
In BigQuery, the logic is basically the same, just some nuances for unnesting the string:
select date, items,
string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items,
string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items
from (select t.*, item,
lag(date) over (partition by item order by date) as prev_item_date,
lead(date) over (partition by item order by date) as next_item_date
from (select t.*,
lag(date) over (order by date) as prev_date,
lead(date) over (order by date) as next_date
from t
) t cross join
unnest(regexp_extract_all(replace(t.items, ', ', ','), ',')) item
) ti
group by date, items;