I have a dataset like the above. Basically it’s like a website session which I have a start + end activity, and in between there could be other unrelated records.
I need to pull the records based on:
- I need to report all the “end” activity from the table
- In each “end” record, I need to find its nearest previous “start” row’s “Content_ID” value. The record must also match the “Profile_ID” as well
- The result I expected is as follows:
May I ask how could I construct the SQL to do this?
Thanks so much!
Advertisement
Answer
You might consider below in BigQuery.
SELECT * EXCEPT(Content_ID), LAST_VALUE(IF(Activity = 'start', Content_ID, NULL) IGNORE NULLS) OVER w AS Content_ID FROM sample_table QUALIFY Activity = 'end' WINDOW w AS (PARTITION BY Profile_ID ORDER BY Created_Time);