I want to fill NULL values in device column for each session_id with an associated non-NULL value. How can I achieve that?
Here is the sample data:
+------------+-------+---------+ | session_id | step | device | +------------+-------+---------+ | 351acc | step1 | | | 351acc | step2 | | | 351acc | step3 | mobile | | 351acc | step4 | mobile | | 350bca | step1 | desktop | | 350bca | step2 | | | 350bca | step3 | | | 350bca | step4 | desktop | +------------+-------+---------+
Desired output:
+------------+-------+---------+ | session_id | step | device | +------------+-------+---------+ | 351acc | step1 | mobile | | 351acc | step2 | mobile | | 351acc | step3 | mobile | | 351acc | step4 | mobile | | 350bca | step1 | desktop | | 350bca | step2 | desktop | | 350bca | step3 | desktop | | 350bca | step4 | desktop | +------------+-------+---------+
Advertisement
Answer
The window function first_value() with the right ordering is probably cheapest:
SELECT session_id, step
, COALESCE(device
, first_value(device) OVER (PARTITION BY session_id ORDER BY device IS NULL, step)
) AS device
FROM tbl
ORDER BY session_id DESC, step;
db<>fiddle here
ORDER BY device IS NULL, step sorts NULL values last, so the earliest step with a notnull value is picked. See:
If notnull devices per session_id are always the same, you can simplify to just ORDER BY device IS NULL. And you don’t need COALESCE.