Replace NULL values per partition

Tags: , , , ,



I want to fill NULL values in device column for each session_id with an associated non-NULL value. How can I achieve that?

Here is the sample data:

+------------+-------+---------+
| session_id | step  | device  |
+------------+-------+---------+
| 351acc     | step1 |         |
| 351acc     | step2 |         |
| 351acc     | step3 | mobile  |
| 351acc     | step4 | mobile  |
| 350bca     | step1 | desktop |
| 350bca     | step2 |         |
| 350bca     | step3 |         |
| 350bca     | step4 | desktop |
+------------+-------+---------+

Desired output:

+------------+-------+---------+
| session_id | step  | device  |
+------------+-------+---------+
| 351acc     | step1 | mobile  |
| 351acc     | step2 | mobile  |
| 351acc     | step3 | mobile  |
| 351acc     | step4 | mobile  |
| 350bca     | step1 | desktop |
| 350bca     | step2 | desktop |
| 350bca     | step3 | desktop |
| 350bca     | step4 | desktop |
+------------+-------+---------+

Answer

The window function first_value() with the right ordering is probably cheapest:

SELECT session_id, step
     , COALESCE(device
              , first_value(device) OVER (PARTITION BY session_id ORDER BY device IS NULL, step)
               ) AS device
FROM   tbl
ORDER  BY session_id DESC, step;

db<>fiddle here

ORDER BY device IS NULL, step sorts NULL values last, so the earliest step with a notnull value is picked. See:

If notnull devices per session_id are always the same, you can simplify to just ORDER BY device IS NULL. And you don’t need COALESCE.



Source: stackoverflow