I want to get interpolation value for NULL. Interpolation is a statistical method by which related known values are used to estimate an unknown price or potential yield of a security. Interpolation is achieved by using other established values that are located in sequence with the unknown value.
Here is my sample table and code.
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=673fcd5bc250bd272e8b6da3d0eddb90
I want to get this result:
| SEQ | cat01 | cat02 | dt_day | price | coeff |
+-----+-------+-------+------------+-------+--------+
| 1 | 230 | 1 | 2019-01-01 | 16000 | 0 |
| 2 | 230 | 1 | 2019-01-02 | NULL | 1 |
| 3 | 230 | 1 | 2019-01-03 | 13000 | 0 |
| 4 | 230 | 1 | 2019-01-04 | NULL | 1 |
| 5 | 230 | 1 | 2019-01-05 | NULL | 2 |
| 6 | 230 | 1 | 2019-01-06 | NULL | 3 |
| 7 | 230 | 1 | 2019-01-07 | 19000 | 0 |
| 8 | 230 | 1 | 2019-01-08 | 20000 | 0 |
| 9 | 230 | 1 | 2019-01-09 | 21500 | 0 |
| 10 | 230 | 1 | 2019-01-10 | 21500 | 0 |
| 11 | 230 | 1 | 2019-01-11 | NULL | 1 |
| 12 | 230 | 1 | 2019-01-12 | NULL | 2 |
| 13 | 230 | 1 | 2019-01-13 | 23000 | 0 |
| 1 | 230 | 2 | 2019-01-01 | NULL | 1 |
| 2 | 230 | 2 | 2019-01-02 | NULL | 2 |
| 3 | 230 | 2 | 2019-01-03 | 12000 | 0 |
| 4 | 230 | 2 | 2019-01-04 | 17000 | 0 |
| 5 | 230 | 2 | 2019-01-05 | 22000 | 0 |
| 6 | 230 | 2 | 2019-01-06 | NULL | 1 |
| 7 | 230 | 2 | 2019-01-07 | 23000 | 0 |
| 8 | 230 | 2 | 2019-01-08 | 23200 | 0 |
| 9 | 230 | 2 | 2019-01-09 | NULL | 1 |
| 10 | 230 | 2 | 2019-01-10 | NULL | 2 |
| 11 | 230 | 2 | 2019-01-11 | NULL | 3 |
| 12 | 230 | 2 | 2019-01-12 | NULL | 4 |
| 13 | 230 | 2 | 2019-01-13 | 23000 | 0 |
I use this code. I think this code incorrect.
coeff
is the NULL is in order set.
This code is for implementing interpolation.
I tried to find out between the empty values and divide them by the number of spaces.
But, this code is incorrect.
WITH ROW_VALUE AS
(
SELECT SEQ
, dt_day
, cat01
, cat02
, price
, ROW_NUMBER() OVER (ORDER BY dt_day) AS sub_seq
FROM (
SELECT SEQ
, cat01
, cat02
, dt_day
, dt_week
, dt_month
, price
FROM temp01
WHERE price IS NOT NULL
)val
)
,STEP_CHANGE AS(
SELECT RV1.SEQ AS id_Start
, RV1.SEQ - 1 AS id_End
, RV1.cat01
, RV1.cat02
, RV1.dt_day
, RV1.price
, (RV2.price - RV1.price)/(RV2.SEQ - RV1.SEQ) AS change1
FROM ROW_VALUE RV1
LEFT JOIN ROW_VALUE RV2 ON RV1.cat01 = RV2.cat01
AND RV1.cat02 = RV2.cat02
AND RV1.SEQ = RV2.SEQ - 1
)
SELECT *
FROM STEP_CHANGE
ORDER BY cat01, cat02, dt_day
Please, let me know what a good way to fill NULL using linear relationships.
If there is another good way, please recommend it.
Advertisement
Answer
If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:
- Use window functions to get the next and previous days with prices for each row.
- Use window functions or joins to get the prices on those days as well.
- Use arithmetic to calculate the linear interpolation.
You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:
select t.*,
coalesce(t.price,
(tprev.price +
(tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) *
datediff(day, t.prev_price_day, t.dt_day)
)
) as imputed_price
from (select t.*,
max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day,
min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day
from temp01 t
) t left join
temp01 tprev
on tprev.cat01 = t.cat01 and
tprev.cat02 = t.cat02 and
tprev.dt_day = t.prev_price_day left join
temp01 tnext
on tnext.cat01 = t.cat01 and
tnext.cat02 = t.cat02 and
tnext.dt_day = t.next_price_day
order by cat01, cat02, dt_day;
Here is a db<>fiddle.