Skip to content
Advertisement

Create a stored procedure such that rows with same key are converted into a column

You are given the table tracks, which contains the following columns:

  1. received_at: the unique timestamp of the action
  2. event_name: the name of the action that was performed at this time
  3. anonymous_id: the anonymous ID of a user
  4. user_id: the user ID, which can be null

This is the mentioned DDL for the problem.

CREATE TABLE IF NOT EXISTS `tracks` 
(
    `received_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
    `event_name` varchar(300) NOT NULL,
    `anonymous_id` int(2) NOT NULL,
    `user_id` int(2)
) DEFAULT CHARSET=utf8;

INSERT INTO `tracks` (`received_at`, `event_name`, `anonymous_id`,`user_id`) 
VALUES
  ('2016-01-01 12:13:12', 'buttonClicked', 1,NULL),
  ('2016-01-02 12:14:15', 'pageReloaded', 3,NULL),
  ('2016-02-02 13:15:13', 'pageRendered', 2,NULL),
  ('2016-02-03 13:15:23', 'commentWritten', 3,NULL),
  ('2016-03-03 14:15:15', 'avatarUpdated', 2,2),
  ('2016-03-04 14:15:24', 'statusUpdated', 1,1);

Task is to find two events for each anonymous_id, which will be the column anonym_id in the returned table. Find the last event where the user was tracked only by anonymous_id (column last_null) and the first event that was tracked by user_id (column first_notnull). The resulting table should be sorted by anonym_id.

As mentioned according to the problem statement this is the desired outcome.

anonym_id | last_null     | first_notnull
----------+---------------+---------------
1         | buttonClicked | statusUpdated
2         | pageRendered  | avatarUpdated
3         | commentWritten| NULL

We have tried achieving the outcome using UNION on two different query resultset as shown but still not able to get the desired outcome.

select a.anonymous_id,a.event_name,b.event_name from (
(select anonymous_id,event_name from tracks where user_id is null group by anonymous_id) as a 
union 
(select anonymous_id,event_name from tracks where user_id is not null group by anonymous_id) as b  

Can Someone throw light on how to come to the desired outcome? How to get the outcome in the most efficient less time-consuming way?

Advertisement

Answer

First find out the min and maxs (sub query s) then use conditional aggregation

select anonymous_id,
         max(case when received_at = maxnull then event_name else null end) as maxnull,
         max(case when received_at = minnotnull then event_name else null end) as minnotnull
from
(
select anonymous_id, received_at,event_name,
         (select max(received_at) maxnull from tracks t1 where t1.anonymous_id = t.anonymous_id and user_id is null) maxnull,
         (select min(received_at) minnotnull from tracks t1 where t1.anonymous_id = t.anonymous_id and user_id is not null) minnotnull
from tracks t
) s
group by anonymous_id;
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement