Skip to content
Advertisement

How to get interpolation value in SQL Server?

I want to get interpolation value for NULL. Interpolation is a statistical method by which related known values are used to estimate an unknown price or potential yield of a security. Interpolation is achieved by using other established values that are located in sequence with the unknown value.

Here is my sample table and code.

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=673fcd5bc250bd272e8b6da3d0eddb90

I want to get this result:

    | SEQ | cat01 | cat02 |   dt_day   | price |  coeff |
    +-----+-------+-------+------------+-------+--------+
    |  1  |  230  |   1   | 2019-01-01 | 16000 |    0   |
    |  2  |  230  |   1   | 2019-01-02 |  NULL |    1   |
    |  3  |  230  |   1   | 2019-01-03 | 13000 |    0   |
    |  4  |  230  |   1   | 2019-01-04 |  NULL |    1   |
    |  5  |  230  |   1   | 2019-01-05 |  NULL |    2   |    
    |  6  |  230  |   1   | 2019-01-06 |  NULL |    3   | 
    |  7  |  230  |   1   | 2019-01-07 | 19000 |    0   |
    |  8  |  230  |   1   | 2019-01-08 | 20000 |    0   |
    |  9  |  230  |   1   | 2019-01-09 | 21500 |    0   |
    | 10  |  230  |   1   | 2019-01-10 | 21500 |    0   |
    | 11  |  230  |   1   | 2019-01-11 |  NULL |    1   |
    | 12  |  230  |   1   | 2019-01-12 |  NULL |    2   |
    | 13  |  230  |   1   | 2019-01-13 | 23000 |    0   |     
    |  1  |  230  |   2   | 2019-01-01 |  NULL |    1   |
    |  2  |  230  |   2   | 2019-01-02 |  NULL |    2   |
    |  3  |  230  |   2   | 2019-01-03 | 12000 |    0   |
    |  4  |  230  |   2   | 2019-01-04 | 17000 |    0   |
    |  5  |  230  |   2   | 2019-01-05 | 22000 |    0   |    
    |  6  |  230  |   2   | 2019-01-06 |  NULL |    1   | 
    |  7  |  230  |   2   | 2019-01-07 | 23000 |    0   |
    |  8  |  230  |   2   | 2019-01-08 | 23200 |    0   |
    |  9  |  230  |   2   | 2019-01-09 |  NULL |    1   |
    | 10  |  230  |   2   | 2019-01-10 |  NULL |    2   |
    | 11  |  230  |   2   | 2019-01-11 |  NULL |    3   |
    | 12  |  230  |   2   | 2019-01-12 |  NULL |    4   |
    | 13  |  230  |   2   | 2019-01-13 | 23000 |    0   | 

I use this code. I think this code incorrect.

coeff is the NULL is in order set.

This code is for implementing interpolation.

I tried to find out between the empty values and divide them by the number of spaces.

But, this code is incorrect.

WITH ROW_VALUE AS
(
    SELECT SEQ
    , dt_day
    , cat01
    , cat02
    , price
    , ROW_NUMBER() OVER (ORDER BY dt_day) AS sub_seq
FROM (
    SELECT SEQ
        , cat01
        , cat02
        , dt_day
        , dt_week
        , dt_month
        , price
    FROM temp01
    WHERE price IS NOT NULL
        )val
)
,STEP_CHANGE AS(
SELECT RV1.SEQ AS id_Start
    , RV1.SEQ - 1 AS id_End
    , RV1.cat01
    , RV1.cat02
    , RV1.dt_day
    , RV1.price      
    , (RV2.price - RV1.price)/(RV2.SEQ - RV1.SEQ) AS change1
FROM ROW_VALUE RV1
LEFT JOIN ROW_VALUE RV2 ON RV1.cat01 = RV2.cat01
AND RV1.cat02 = RV2.cat02
AND RV1.SEQ = RV2.SEQ - 1
)
SELECT *
FROM STEP_CHANGE
ORDER BY cat01, cat02, dt_day

Please, let me know what a good way to fill NULL using linear relationships.

If there is another good way, please recommend it.

Advertisement

Answer

If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:

  • Use window functions to get the next and previous days with prices for each row.
  • Use window functions or joins to get the prices on those days as well.
  • Use arithmetic to calculate the linear interpolation.

You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:

select t.*,
       coalesce(t.price, 
                (tprev.price +
                 (tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) *
                 datediff(day, t.prev_price_day, t.dt_day)
                )
               ) as imputed_price
from (select t.*,
             max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day,
             min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day
      from temp01 t
     ) t left join 
     temp01 tprev
     on tprev.cat01 = t.cat01 and
        tprev.cat02 = t.cat02 and
        tprev.dt_day = t.prev_price_day left join
     temp01 tnext
     on tnext.cat01 = t.cat01 and
        tnext.cat02 = t.cat02 and
        tnext.dt_day = t.next_price_day 
order by cat01, cat02, dt_day;

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement