Skip to content
Advertisement

SQL query to interpolate between values

I intend to interpolate (linear interpolation) between values in a column and insert that into a new column using a SQL query. Based on my search online, I suspect LEAD analytic function could be useful. I am new to writing SQL queries. So, any insights on how it can be achieved will be quite helpful.

The sample data set is as described:

Emp  Test_date  Value
---  ---------  -----
A    1/1/2001   null
A    1/2/2001   100
A    1/3/2001   null
A    1/4/2001   80
A    1/5/2001   null
A    1/6/2001   null
A    1/7/2001   75

The idea is to get a fourth column that has values as:

null
100
interpolatedValue1
80
interpolatedValue2
interpolatedValue3
75

Interpolatedvalue1 would be an interpolated value between 100 and 80,

Interpolatedvalue2 would be a linearly interpolated value between 80 and 75.

InterpolatedValue3 would be a linearly interpolated value between Interpolatedvalue2 and 75

Here’s how a simple linear interpolation works:

Given two points (V1 at D1), (V3 at D3). What would be the value V2 at D2?

(V3-V1)/(D3-D1) * (D2-D1) + V1

Advertisement

Answer

This can probably be simplified a bit but gets the answer you wanted, I believe. The slightly tricky bit is getting both the number of days between not-null values (i.e. the size of the gap you’re filling) and then the position within that gap:

-- CTE for sample data
with your_table (emp, test_date, value) as (
            select 'A', date '2001-01-01', null from dual
  union all select 'A', date '2001-01-02', 100 from dual
  union all select 'A', date '2001-01-03', null from dual
  union all select 'A', date '2001-01-04', 80 from dual
  union all select 'A', date '2001-01-05', null from dual
  union all select 'A', date '2001-01-06', null from dual
  union all select 'A', date '2001-01-07', 75 from dual
)
-- actual query
select emp, test_date, value,
  coalesce(value,
    (next_value - prev_value) -- v3-v1
    / (count(*) over (partition by grp) + 1) -- d3-d1
    * row_number() over (partition by grp order by test_date desc) -- d2-d1, indirectly
    + prev_value -- v1
  ) as interpolated
from (
  select emp, test_date, value,
    last_value(value ignore nulls)
      over (partition by emp order by test_date) as prev_value,
    first_value(value ignore nulls)
      over (partition by emp order by test_date range between current row and unbounded following) as next_value,
    row_number() over (partition by emp order by test_date) -
      row_number() over (partition by emp order by case when value is null then 1 else 0 end, test_date) as grp
  from your_table
)
order by test_date;
E TEST_DATE       VALUE INTERPOLATED
- ---------- ---------- ------------
A 2001-01-01                        
A 2001-01-02        100          100
A 2001-01-03                      90
A 2001-01-04         80           80
A 2001-01-05              76.6666667
A 2001-01-06              78.3333333
A 2001-01-07         75           75

I’ve used last_value and first_value instead of lead and lag, but either works. (Lead/lag might be faster on a large data set I suppose). The grp calculation is Tabibitosan.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement