Let’s say I have data like this:
x
USER_ID TIMESTAMP data data2
0001 2021-05-09 12:13:03.445 44
0001 2021-05-09 13:13:03.445 rob
0001 2021-05-09 11:13:03.445
0002 2021-05-09 09:13:03.445 perry 333
0002 2021-05-09 12:13:03.445 carl 333
0003 2021-05-09 16:13:03.445 mitch 1
0003 2021-05-09 17:13:03.445
0002 2021-05-09 16:13:03.445 mitch 5
All I want to do is collect the most recent non-null value from each column and condense them into a table with each row being an entry.
Final result:
USER_ID data data2
0001 rob 44
0003 mitch 1
0002 mitch 5
Here’s what I have but it’s not complete:
WITH form AS (
select b.*,
rank() over (
partition by user_id
order by timestamp DESC
) as num
FROM b
SELECT *
FROM b
WHERE num = 1
Advertisement
Answer
Related: Equivalent for Keep in Snowflake:
It could be achieved with:
WITH cte(user_id, timestamp, "data", data2) AS (
SELECT *
FROM (VALUES ('0001','2021-05-09 12:13:03.445'::timestamp,NULL,44),
('0001','2021-05-09 13:13:03.445'::timestamp,'rob',NULL),
('0001','2021-05-09 11:13:03.445'::timestamp,NULL,NULL),
('0002','2021-05-09 09:13:03.445'::timestamp,'perry',333),
('0002','2021-05-09 12:13:03.445'::timestamp,'carl',333),
('0003','2021-05-09 16:13:03.445'::timestamp,'mitch',1),
('0003','2021-05-09 17:13:03.445'::timestamp,NULL,NULL),
('0002','2021-05-09 16:13:03.445'::timestamp,'mitch',5)
)
)
SELECT user_id,
(ARRAY_AGG("data") WITHIN GROUP (ORDER BY timestamp DESC))[0]::STRING AS "data",
(ARRAY_AGG(data2) WITHIN GROUP (ORDER BY timestamp DESC))[0] AS data2
FROM cte
GROUP BY user_id
ORDER BY user_id;
Output:
+---------+----------+-------+
| USER_ID | data | data2 |
+---------+----------+-------+
| 0001 | rob | 44 |
| 0002 | mitch | 5 |
| 0003 | mitch | 1 |
+---------+----------+-------+
ARRAY_AGG
by default omits NULLs and it is sorted by timestamp descending. Once array per user_id
is created it is a matter of accesing first element(element with index [0]).