I want to get interpolation value for NULL. Interpolation is a statistical method by which related known values are used to estimate an unknown price or potential yield of a security. Interpolation is achieved by using other established values that are located in sequence with the unknown value.
Here is my sample table and code.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=673fcd5bc250bd272e8b6da3d0eddb90
I want to get this result:
| SEQ | cat01 | cat02 | dt_day | price | coeff | +-----+-------+-------+------------+-------+--------+ | 1 | 230 | 1 | 2019-01-01 | 16000 | 0 | | 2 | 230 | 1 | 2019-01-02 | NULL | 1 | | 3 | 230 | 1 | 2019-01-03 | 13000 | 0 | | 4 | 230 | 1 | 2019-01-04 | NULL | 1 | | 5 | 230 | 1 | 2019-01-05 | NULL | 2 | | 6 | 230 | 1 | 2019-01-06 | NULL | 3 | | 7 | 230 | 1 | 2019-01-07 | 19000 | 0 | | 8 | 230 | 1 | 2019-01-08 | 20000 | 0 | | 9 | 230 | 1 | 2019-01-09 | 21500 | 0 | | 10 | 230 | 1 | 2019-01-10 | 21500 | 0 | | 11 | 230 | 1 | 2019-01-11 | NULL | 1 | | 12 | 230 | 1 | 2019-01-12 | NULL | 2 | | 13 | 230 | 1 | 2019-01-13 | 23000 | 0 | | 1 | 230 | 2 | 2019-01-01 | NULL | 1 | | 2 | 230 | 2 | 2019-01-02 | NULL | 2 | | 3 | 230 | 2 | 2019-01-03 | 12000 | 0 | | 4 | 230 | 2 | 2019-01-04 | 17000 | 0 | | 5 | 230 | 2 | 2019-01-05 | 22000 | 0 | | 6 | 230 | 2 | 2019-01-06 | NULL | 1 | | 7 | 230 | 2 | 2019-01-07 | 23000 | 0 | | 8 | 230 | 2 | 2019-01-08 | 23200 | 0 | | 9 | 230 | 2 | 2019-01-09 | NULL | 1 | | 10 | 230 | 2 | 2019-01-10 | NULL | 2 | | 11 | 230 | 2 | 2019-01-11 | NULL | 3 | | 12 | 230 | 2 | 2019-01-12 | NULL | 4 | | 13 | 230 | 2 | 2019-01-13 | 23000 | 0 |
I use this code. I think this code incorrect.
coeff
is the NULL is in order set.
This code is for implementing interpolation.
I tried to find out between the empty values and divide them by the number of spaces.
But, this code is incorrect.
WITH ROW_VALUE AS ( SELECT SEQ , dt_day , cat01 , cat02 , price , ROW_NUMBER() OVER (ORDER BY dt_day) AS sub_seq FROM ( SELECT SEQ , cat01 , cat02 , dt_day , dt_week , dt_month , price FROM temp01 WHERE price IS NOT NULL )val ) ,STEP_CHANGE AS( SELECT RV1.SEQ AS id_Start , RV1.SEQ - 1 AS id_End , RV1.cat01 , RV1.cat02 , RV1.dt_day , RV1.price , (RV2.price - RV1.price)/(RV2.SEQ - RV1.SEQ) AS change1 FROM ROW_VALUE RV1 LEFT JOIN ROW_VALUE RV2 ON RV1.cat01 = RV2.cat01 AND RV1.cat02 = RV2.cat02 AND RV1.SEQ = RV2.SEQ - 1 ) SELECT * FROM STEP_CHANGE ORDER BY cat01, cat02, dt_day
Please, let me know what a good way to fill NULL using linear relationships.
If there is another good way, please recommend it.
Advertisement
Answer
If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:
- Use window functions to get the next and previous days with prices for each row.
- Use window functions or joins to get the prices on those days as well.
- Use arithmetic to calculate the linear interpolation.
You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:
select t.*, coalesce(t.price, (tprev.price + (tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) * datediff(day, t.prev_price_day, t.dt_day) ) ) as imputed_price from (select t.*, max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day, min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day from temp01 t ) t left join temp01 tprev on tprev.cat01 = t.cat01 and tprev.cat02 = t.cat02 and tprev.dt_day = t.prev_price_day left join temp01 tnext on tnext.cat01 = t.cat01 and tnext.cat02 = t.cat02 and tnext.dt_day = t.next_price_day order by cat01, cat02, dt_day;
Here is a db<>fiddle.