I have a table similar to this small example:
I want to manipulate it to this format:
Here’s a sample SQL script to create an example input table:
x
CREATE TABLE sample_table
(
id INT,
hr INT,
tm DATETIME,
score INT,
)
INSERT INTO sample_table
VALUES (1, 0, '2021-01-21 00:26:45', 2765),
(1, 0, '2021-01-21 00:49:00', 2765),
(1, 5, '2021-01-21 07:47:03', 1593),
(1, 7, '2021-01-21 11:50:48', 1604),
(1, 7, '2021-01-21 12:00:32', 1604),
(2, 0, '2021-01-21 00:50:45', 3500),
(2, 2, '2021-01-21 01:49:00', 2897),
(2, 2, '2021-01-21 05:47:03', 2897),
(2, 4, '2021-01-21 09:30:48', 2400),
(2, 6, '2021-01-21 12:00:32', 1647);
I tried using combination of LAG and CASE WHEN, not successful so far. Looking for some ideas on how to manipulate (what functions etc). Would be awesome to see example script for the manipulation.
Where there is multiple values per id & hr, then earliest values to be used. E.g. id=1 & hr=7, then hr_7=uses value from 11:50. Although in this example, it’s the same values for both records, it can differ.
Advertisement
Answer
I would suggest this logic:
with u as ( -- get unique values
select id, hr, tm, score,
lead(hr) over (partition by id order by hr) as next_hr
from (select t.*,
row_number() over (partition by id, hr order by tm asc) as seqnum
from t
)
where seqnum = 1
)
select id,
max(case when hr <= 1 and (next_hr > 1 or next_hr is null) then score end) as hr_1,
max(case when hr <= 2 and (next_hr > 2 or next_hr is null) then score end) as hr_2,
max(case when hr <= 3 and (next_hr > 3 or next_hr is null) then score end) as hr_3,
max(case when hr <= 4 and (next_hr > 4 or next_hr is null) then score end) as hr_4,
max(case when hr <= 5 and (next_hr > 5 or next_hr is null) then score end) as hr_5,
max(case when hr <= 6 and (next_hr > 6 or next_hr is null) then score end) as hr_6,
max(case when hr <= 7 and (next_hr > 7 or next_hr is null) then score end) as hr_7,
max(case when hr <= 8 and (next_hr > 8 or next_hr is null) then score end) as hr_8
from t
group by id;
This first removes the duplicates and then adds a range for when the score is valid. The conditional aggregation then uses this information.