I have a question that a little bit similar with question#66044663 but more complicated.
Here’s my dummy data.
I want to get 3 adjacent actions(no duplicate) from the flag by each user.
Here’s the chart to describe my thought.
Here’s what I want:
How can I implement SQL(I use Google Bigquery)? I know the function LAG could be a solution but I have no idea how to avoid the duplicate actions.
Hope someone can light me up. Thanks a million!
Here’s the code for generating the dataset.
WITH src_table AS ( SELECT 'Jack' AS User, 1 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL SELECT 'Jack' AS User, 2 AS Sequence, 'Work' AS Action, '' AS Flag UNION ALL SELECT 'Jack' AS User, 3 AS Sequence, 'Sleep' AS Action, 'Flag A' AS Flag UNION ALL SELECT 'Jack' AS User, 4 AS Sequence, 'Exercise' AS Action, 'Flag B' AS Flag UNION ALL SELECT 'Kenny' AS User, 1 AS Sequence, 'Run' AS Action, '' AS Flag UNION ALL SELECT 'Kenny' AS User, 2 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL SELECT 'Kenny' AS User, 3 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL SELECT 'Kenny' AS User, 4 AS Sequence, 'Work' AS Action, 'Flag C' AS Flag UNION ALL SELECT 'Kenny' AS User, 5 AS Sequence, 'Work' AS Action, 'Flag D' AS Flag UNION ALL SELECT 'May' AS User, 1 AS Sequence, 'Work' AS Action, 'Flag A' AS Flag )
Advertisement
Answer
Consider below
select user, actions.action_sequence, flag from ( select *, ( select as struct count(1) actions_count, string_agg(action, ' >> ' order by grp) action_sequence from ( select action, grp from t.arr group by action, grp )) actions from ( select *, array_agg(struct(action, grp)) over(partition by user order by grp desc range between current row and 2 following) arr from ( select *, countif(change) over(partition by user order by sequence) grp from ( select *, action != lag(action) over(partition by user order by sequence) change from src_table ) ) ) t ) where flag != '' and actions.actions_count = 3 # order by user, sequence
If to apply to sample data in your question – output is
NOTE: above solution works for any number of adjacent actions (no duplicate) – you just need to change it (2 and 3) in two respective places
over(partition by user order by grp desc range between current row and 2 following) arr
and
and actions.actions_count = 3