I have a question that a little bit similar with question#66044663 but more complicated.
Here’s my dummy data.
I want to get 3 adjacent actions(no duplicate) from the flag by each user.
Here’s the chart to describe my thought.
Here’s what I want:
How can I implement SQL(I use Google Bigquery)? I know the function LAG could be a solution but I have no idea how to avoid the duplicate actions.
Hope someone can light me up. Thanks a million!
Here’s the code for generating the dataset.
x
WITH
src_table AS (
SELECT 'Jack' AS User, 1 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 2 AS Sequence, 'Work' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 3 AS Sequence, 'Sleep' AS Action, 'Flag A' AS Flag UNION ALL
SELECT 'Jack' AS User, 4 AS Sequence, 'Exercise' AS Action, 'Flag B' AS Flag UNION ALL
SELECT 'Kenny' AS User, 1 AS Sequence, 'Run' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 2 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 3 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 4 AS Sequence, 'Work' AS Action, 'Flag C' AS Flag UNION ALL
SELECT 'Kenny' AS User, 5 AS Sequence, 'Work' AS Action, 'Flag D' AS Flag UNION ALL
SELECT 'May' AS User, 1 AS Sequence, 'Work' AS Action, 'Flag A' AS Flag
)
Advertisement
Answer
Consider below
select user, actions.action_sequence, flag from (
select *, (
select as struct count(1) actions_count,
string_agg(action, ' >> ' order by grp) action_sequence
from (
select action, grp from t.arr group by action, grp
)) actions
from (
select *, array_agg(struct(action, grp))
over(partition by user order by grp desc range between current row and 2 following) arr
from (
select *, countif(change) over(partition by user order by sequence) grp
from (
select *, action != lag(action) over(partition by user order by sequence) change
from src_table
)
)
) t
)
where flag != ''
and actions.actions_count = 3
# order by user, sequence
If to apply to sample data in your question – output is
NOTE: above solution works for any number of adjacent actions (no duplicate) – you just need to change it (2 and 3) in two respective places
over(partition by user order by grp desc range between current row and 2 following) arr
and
and actions.actions_count = 3