I have a table similar to this small example:
I want to manipulate it to this format:
Here’s a sample SQL script to create an example input table:
CREATE TABLE sample_table ( id INT, hr INT, tm DATETIME, score INT, ) INSERT INTO sample_table VALUES (1, 0, '2021-01-21 00:26:45', 2765), (1, 0, '2021-01-21 00:49:00', 2765), (1, 5, '2021-01-21 07:47:03', 1593), (1, 7, '2021-01-21 11:50:48', 1604), (1, 7, '2021-01-21 12:00:32', 1604), (2, 0, '2021-01-21 00:50:45', 3500), (2, 2, '2021-01-21 01:49:00', 2897), (2, 2, '2021-01-21 05:47:03', 2897), (2, 4, '2021-01-21 09:30:48', 2400), (2, 6, '2021-01-21 12:00:32', 1647);
I tried using combination of LAG and CASE WHEN, not successful so far. Looking for some ideas on how to manipulate (what functions etc). Would be awesome to see example script for the manipulation.
Where there is multiple values per id & hr, then earliest values to be used. E.g. id=1 & hr=7, then hr_7=uses value from 11:50. Although in this example, it’s the same values for both records, it can differ.
Advertisement
Answer
I would suggest this logic:
with u as ( -- get unique values select id, hr, tm, score, lead(hr) over (partition by id order by hr) as next_hr from (select t.*, row_number() over (partition by id, hr order by tm asc) as seqnum from t ) where seqnum = 1 ) select id, max(case when hr <= 1 and (next_hr > 1 or next_hr is null) then score end) as hr_1, max(case when hr <= 2 and (next_hr > 2 or next_hr is null) then score end) as hr_2, max(case when hr <= 3 and (next_hr > 3 or next_hr is null) then score end) as hr_3, max(case when hr <= 4 and (next_hr > 4 or next_hr is null) then score end) as hr_4, max(case when hr <= 5 and (next_hr > 5 or next_hr is null) then score end) as hr_5, max(case when hr <= 6 and (next_hr > 6 or next_hr is null) then score end) as hr_6, max(case when hr <= 7 and (next_hr > 7 or next_hr is null) then score end) as hr_7, max(case when hr <= 8 and (next_hr > 8 or next_hr is null) then score end) as hr_8 from t group by id;
This first removes the duplicates and then adds a range for when the score is valid. The conditional aggregation then uses this information.