Let’s say your web application tracks the activities of its users using a tracking system. While a user hasn’t logged in or signed up, all the user’s actions are tracked using anonymous_id and the user_id is null, and afterward, they are tracked using the same anonymous_id and user_id. It is known that after a user logs in or signs up, the user_id is no longer null.
I am trying to write a query to find the last event where the user was tracked only by anonymous_id (column last_null) and the first event that was tracked by user_id (column first_notnull). The resulting table should be sorted by anonym_id.
For given table tracks;
received_at | event_name | anonymous_id | user_id |
---|---|---|---|
2016-01-01 12:13:12 | buttonClicked | 1 | NULL |
2016-01-02 12:14:15 | pageReloaded | 3 | NULL |
2016-02-02 13:15:13 | pageRendered | 2 | NULL |
2016-02-03 13:15:23 | commentWritten | 3 | NULL |
2016-03-03 14:15:15 | avatarUpdated | 2 | 2 |
2016-03-04 14:15:24 | statusUpdated | 1 | 1 |
Output should be;
anonym_id | last_null | first_notnull |
---|---|---|
1 | buttonClicked | statusUpdated |
2 | pageRendered | avatarUpdated |
3 | commentWritten | NULL |
Here is what I came up with, but I am not sure if there is a better solution. Can you please help me?
SELECT distinct anonymous_id as anonym_id ,CASE WHEN user_id is not null THEN prev_event WHEN user_id is null THEN event_name END AS last_null ,CASE WHEN user_id is not null THEN event_name WHEN user_id is null THEN next_event END AS first_notnull FROM( SELECT * ,max(rnk) FROM( SELECT * ,lag(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as prev_event ,lead(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as next_event ,RANK() OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as rnk FROM tracks ) as temp GROUP BY anonymous_id )as temp2 WHERE prev_event is not NULL
Advertisement
Answer
I can think of a few ways to achieve this, this is one of it:
SELECT anonymous_id, SUBSTRING_INDEX( GROUP_CONCAT( CASE WHEN user_id IS NULL THEN event_name END ORDER BY received_at DESC),',',1) AS last_null, SUBSTRING_INDEX( GROUP_CONCAT( CASE WHEN user_id IS NOT NULL THEN event_name END ORDER BY received_at ASC),',',1) AS first_notnull FROM tracks GROUP BY anonymous_id;
Using CASE
expression then wrap it in GROUP_CONCAT()
with addition of ORDER BY received_at DESC
. Then use SUBSTRING_INDEX()
to get the value from SUBSTRING_INDEX()
. See this fiddle to understand more
This is the quickest I can think of right now but I’ll try a few more ways (probably shorter and more efficient query) and will update the answer if necessary.