Skip to content
Advertisement

MySQL: Find the last event where the user was tracked

Let’s say your web application tracks the activities of its users using a tracking system. While a user hasn’t logged in or signed up, all the user’s actions are tracked using anonymous_id and the user_id is null, and afterward, they are tracked using the same anonymous_id and user_id. It is known that after a user logs in or signs up, the user_id is no longer null.

I am trying to write a query to find the last event where the user was tracked only by anonymous_id (column last_null) and the first event that was tracked by user_id (column first_notnull). The resulting table should be sorted by anonym_id.

For given table tracks;

received_at event_name anonymous_id user_id
2016-01-01 12:13:12 buttonClicked 1 NULL
2016-01-02 12:14:15 pageReloaded 3 NULL
2016-02-02 13:15:13 pageRendered 2 NULL
2016-02-03 13:15:23 commentWritten 3 NULL
2016-03-03 14:15:15 avatarUpdated 2 2
2016-03-04 14:15:24 statusUpdated 1 1

Output should be;

anonym_id last_null first_notnull
1 buttonClicked statusUpdated
2 pageRendered avatarUpdated
3 commentWritten NULL

Here is what I came up with, but I am not sure if there is a better solution. Can you please help me?

   SELECT
        distinct anonymous_id as anonym_id
       ,CASE
           WHEN user_id is not null THEN prev_event
           WHEN user_id is null THEN event_name
        END AS last_null 
       ,CASE
           WHEN user_id is not null THEN event_name
           WHEN user_id is null THEN next_event
        END AS first_notnull
       FROM(
           SELECT
               *
               ,max(rnk)
           FROM(    
                   SELECT
                       *
                       ,lag(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as prev_event          
                       ,lead(event_name) OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as next_event
                       ,RANK() OVER(PARTITION BY anonymous_id ORDER BY received_at ASC) as rnk
                   FROM tracks
                   ) as temp
           GROUP BY anonymous_id
       )as temp2
WHERE prev_event is not NULL             

Advertisement

Answer

I can think of a few ways to achieve this, this is one of it:

SELECT anonymous_id,
       SUBSTRING_INDEX(
       GROUP_CONCAT(
             CASE WHEN user_id IS NULL 
                  THEN event_name END
                ORDER BY received_at DESC),',',1) AS last_null,
       SUBSTRING_INDEX(
       GROUP_CONCAT(
             CASE WHEN user_id IS NOT NULL 
                  THEN event_name END 
              ORDER BY received_at ASC),',',1) AS first_notnull
FROM tracks
GROUP BY anonymous_id;

Using CASE expression then wrap it in GROUP_CONCAT() with addition of ORDER BY received_at DESC. Then use SUBSTRING_INDEX() to get the value from SUBSTRING_INDEX(). See this fiddle to understand more

This is the quickest I can think of right now but I’ll try a few more ways (probably shorter and more efficient query) and will update the answer if necessary.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement