Skip to content

“Flattening” complex sequences of related rows in SQL

I have a table with events describing user navigation through flows, with this general shape (disregard the timestamps, I smashed some keys):

user timestamp event_name event_choice
1    12345678  start
1    45678912  choose     option_a
1    45678912  end

After the option_a event, users can go back and choose another option:

2    83475656  start
2    45678912  choose     option_a
2    93847398  back
2    45678912  choose     option_b
2    23487383  end

Users can also go through a flow multiple times:

3    83475656  start
3    45678912  choose     option_a
3    93847398  end
3    93847398  start
3    45678912  choose     option_b
3    23487383  end

I want to produce a table where each row represents a flow, from start to end, including the selected option along the way. For example:

user start_at  end_at   choice
1    83475656  23487383 option_a
2    83475656  23487383 option_b
3    83475656  23487383 option_a
3    83475656  23487383 option_b

I need to avoid:

  1. False results from the back event changing the option mid-flow.
  2. False results from mixing up two different instances of the flow.
  3. False results from mixing up two different users.

I have no idea how to approach this. If it’s any difference, I’m using BigQuery, which has some differences with standard SQL and may have some specific functions.



You can add up the starts to assign a grouping. Then you want the last option in each group:

select user, min(timestamp) as start_at, max(timestamp) as stop_at,
       array_agg( event_choice ignore nulls order by timestamp desc limit 1)[ordinal(1)] as choice
from (select t.*,
             countif(event_name = 'start') over (partition by user order by timestamp) as grp
      from t
     ) t
group by user, grp;

This makes some assumptions about the data, notably:

  • 'start' and 'stop' are lined up.
  • you want the most recent non-null option_choice.
6 People found this is helpful