I have data which has missing values irregulaly, and I’d like to convert it with a certain interval with liner interpolation using BigQuery Standard SQL.
Specifically, I have data like this:
# data is missing irregulary +------+-------+ | time | value | +------+-------+ | 1 | 3.0 | | 5 | 5.0 | | 7 | 1.0 | | 9 | 8.0 | | 10 | 4.0 | +------+-------+
and I’d like to convert this table as follows:
# interpolated with interval of 1 +------+--------------------+ | time | value_interpolated | +------+--------------------+ | 1 | 3.0 | | 2 | 3.5 | | 3 | 4.0 | | 4 | 4.5 | | 5 | 5.0 | | 6 | 3.0 | | 7 | 1.0 | | 8 | 4.5 | | 9 | 8.0 | | 10 | 4.0 | +------+--------------------+
Any smart soluton for this?
Supplement: this question is similar to this question in stackoverflow but different in that the data is missing irregulaly.
Thank you.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL select time, ifnull(value, start_value + (end_value - start_value) / (end_tick - start_tick) * (time - start_tick)) as value_interpolated from ( select time, value, first_value(tick ignore nulls) over win1 as start_tick, first_value(value ignore nulls) over win1 as start_value, first_value(tick ignore nulls) over win2 as end_tick, first_value(value ignore nulls) over win2 as end_value, from ( select time, t.time as tick, value from ( select generate_array(min(time), max(time)) times from `project.dataset.table` ), unnest(times) time left join `project.dataset.table` t using(time) ) window win1 as (order by time desc rows between current row and unbounded following), win2 as (order by time rows between current row and unbounded following) )
if to apply to sample data from your question – output is