Skip to content
Advertisement

SQL to get 3 adjacent actions without duplicate from the flags

I have a question that a little bit similar with question#66044663 but more complicated.

Here’s my dummy data.

enter image description here

I want to get 3 adjacent actions(no duplicate) from the flag by each user.

Here’s the chart to describe my thought.

enter image description here

Here’s what I want:

enter image description here

How can I implement SQL(I use Google Bigquery)? I know the function LAG could be a solution but I have no idea how to avoid the duplicate actions.

Hope someone can light me up. Thanks a million!

Here’s the code for generating the dataset.

WITH
src_table AS (
SELECT 'Jack' AS User, 1 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 2 AS Sequence, 'Work' AS Action, '' AS Flag UNION ALL
SELECT 'Jack' AS User, 3 AS Sequence, 'Sleep' AS Action, 'Flag A' AS Flag UNION ALL
SELECT 'Jack' AS User, 4 AS Sequence, 'Exercise' AS Action, 'Flag B' AS Flag UNION ALL
SELECT 'Kenny' AS User, 1 AS Sequence, 'Run' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 2 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 3 AS Sequence, 'Eat' AS Action, '' AS Flag UNION ALL
SELECT 'Kenny' AS User, 4 AS Sequence, 'Work' AS Action, 'Flag C' AS Flag UNION ALL
SELECT 'Kenny' AS User, 5 AS Sequence, 'Work' AS Action, 'Flag D' AS Flag UNION ALL
SELECT 'May' AS User, 1 AS Sequence, 'Work' AS Action, 'Flag A' AS Flag
)

Advertisement

Answer

Consider below

select user, actions.action_sequence, flag  from (
  select *, (
    select as struct count(1) actions_count,
      string_agg(action, ' >> ' order by grp) action_sequence
    from (
      select action, grp from t.arr group by action, grp
    )) actions
  from (
    select *, array_agg(struct(action, grp)) 
      over(partition by user order by grp desc range between current row and 2 following) arr
    from (
      select *, countif(change) over(partition by user order by sequence) grp
      from (
        select *, action != lag(action) over(partition by user order by sequence) change
        from src_table
      )
    )
  ) t
)
where flag != '' 
and actions.actions_count = 3
# order by user, sequence

If to apply to sample data in your question – output is

enter image description here

NOTE: above solution works for any number of adjacent actions (no duplicate) – you just need to change it (2 and 3) in two respective places

over(partition by user order by grp desc range between current row and 2 following) arr    

and

and actions.actions_count = 3   
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement