I have a database with date and list of items. The problem this I do have an event date captured whenever an item is added or deleted. So what I am trying to do is comparing the items for a user ‘A’ on different days (successive rows); here the newly added or deleted items for user ‘A’ should be displayed in new columns ‘Items added’ and ‘Items Removed’
Date | Items List ------------------------------------------------------------- 24th July | Item A, Item B, Item C, Item D 27th July | Item A, Item B, Item C, Item D, Item E, Item F 28th July | Item A, Item D, Item E, Item F, Item G
Expected output for user ‘A’:
Date | Items List | Items Added | Items Removed ----------------------------------------------------------------------------------------- 24th July | Item A, Item B, Item C, Item D | | 27th July | Item A, Item B, Item C, Item D, Item E, Item F | Item E, Item F | 28th July | Item A, Item D, Item E, Item F, Item G | Item G | Item B, Item C
I’d really appreciate if someone help me do this. Thanks in advance
Advertisement
Answer
Although this is a really bad data model, it is something of a fun question. Because Postgres was one of the original tags, there is a Postgres version:
select date, items, string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items, string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items from (select t.*, item, lag(date) over (partition by item order by date) as prev_item_date, lead(date) over (partition by item order by date) as next_item_date from (select t.*, lag(date) over (order by date) as prev_date, lead(date) over (order by date) as next_date from t ) t cross join regexp_split_to_table(t.items, ', ') item ) ti group by date, items;
Here is a db<>fiddle.
In BigQuery, the logic is basically the same, just some nuances for unnesting the string:
select date, items, string_agg( case when prev_date = prev_item_date then null else item end, ', ') as new_items, string_agg( case when next_date = next_item_date then null else item end, ', ') as removed_items from (select t.*, item, lag(date) over (partition by item order by date) as prev_item_date, lead(date) over (partition by item order by date) as next_item_date from (select t.*, lag(date) over (order by date) as prev_date, lead(date) over (order by date) as next_date from t ) t cross join unnest(regexp_extract_all(replace(t.items, ', ', ','), ',')) item ) ti group by date, items;