Skip to content
Advertisement

Manipulate data in SQL (backfilling, pivoting)

I have a table similar to this small example:

![enter image description here

I want to manipulate it to this format:

enter image description here

Here’s a sample SQL script to create an example input table:

CREATE TABLE sample_table
(
    id INT,
    hr INT,
    tm DATETIME,
    score INT, 
 )

INSERT INTO sample_table
VALUES (1, 0, '2021-01-21 00:26:45', 2765),
(1, 0, '2021-01-21 00:49:00', 2765),
(1, 5, '2021-01-21 07:47:03', 1593),
(1, 7, '2021-01-21 11:50:48', 1604),
(1, 7, '2021-01-21 12:00:32', 1604),
(2, 0, '2021-01-21 00:50:45', 3500),
(2, 2, '2021-01-21 01:49:00', 2897),
(2, 2, '2021-01-21 05:47:03', 2897),
(2, 4, '2021-01-21 09:30:48', 2400),
(2, 6, '2021-01-21 12:00:32', 1647);

I tried using combination of LAG and CASE WHEN, not successful so far. Looking for some ideas on how to manipulate (what functions etc). Would be awesome to see example script for the manipulation.

Where there is multiple values per id & hr, then earliest values to be used. E.g. id=1 & hr=7, then hr_7=uses value from 11:50. Although in this example, it’s the same values for both records, it can differ.

Advertisement

Answer

I would suggest this logic:

with u as (   -- get unique values
      select id, hr, tm, score,
             lead(hr) over (partition by id order by hr) as next_hr
      from (select t.*,
                   row_number() over (partition by id, hr order by tm asc) as seqnum
            from t
           )
      where seqnum = 1
     )
select id,
       max(case when hr <= 1 and (next_hr > 1 or next_hr is null) then score end) as hr_1,
       max(case when hr <= 2 and (next_hr > 2 or next_hr is null) then score end) as hr_2,
       max(case when hr <= 3 and (next_hr > 3 or next_hr is null) then score end) as hr_3,
       max(case when hr <= 4 and (next_hr > 4 or next_hr is null) then score end) as hr_4,
       max(case when hr <= 5 and (next_hr > 5 or next_hr is null) then score end) as hr_5,
       max(case when hr <= 6 and (next_hr > 6 or next_hr is null) then score end) as hr_6,
       max(case when hr <= 7 and (next_hr > 7 or next_hr is null) then score end) as hr_7,
       max(case when hr <= 8 and (next_hr > 8 or next_hr is null) then score end) as hr_8
from t
group by id;

This first removes the duplicates and then adds a range for when the score is valid. The conditional aggregation then uses this information.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement