How to carrying over values for missing dates postcode/indicator_category to create full monthly time series. Im trying to use last_value to carry over values but not able to make it. Is my approach correct? Any help would by highly appreciated.
Example given a table:
create table test.indicator_data( postcode text, month_ts date, indicator_cat integer, measure double precision);
INSERT INTO value to indicator_data table
INSERT INTO test.indicator_data VALUES ('sw5', '2017-07-01', 2, 99212.231), ('sw5', '2018-02-01', 2, 232.215), ('sw5', '2017-11-01', 3, 1523.2576), ('sw5', '2017-12-01', 3, 152.16), ('sw5', '2018-02-01', 3, 142.981), ('sw5', '2018-07-01', 3 , 142.1361), ('sw5 9', '2018-03-01', 2, 821.21), ('sw5 9', '2018-02-01', 2, 1182.19);
INPUT:
postcode | month_ts | indicator_cat | measure |
---|---|---|---|
sw5 | 2017-07-01 | 2 | 99212.231 |
sw5 | 2018-02-01 | 2 | 232.215 |
sw5 | 2017-11-01 | 3 | 1523.2576 |
sw5 | 2017-12-01 | 3 | 152.16 |
sw5 | 2018-02-01 | 3 | 142.981 |
sw5 | 2018-07-01 | 3 | 142.1361 |
sw59 | 2018-03-01 | 2 | 821.21 |
sw59 | 2018-02-01 | 2 | 1182.19 |
EXPECTED OUTPUT:
postcode | month_ts | indicator_cat | measure |
---|---|---|---|
sw5 | 2017-07-01 | 2 | 99212.231 |
sw5 | 2017-08-01 | 2 | 99212.231 |
sw5 | 2017-09-01 | 2 | 99212.231 |
sw5 | 2017-10-01 | 2 | 99212.231 |
sw5 | 2017-11-01 | 2 | 99212.231 |
sw5 | 2017-12-01 | 2 | 99212.231 |
sw5 | 2018-01-01 | 2 | 99212.231 |
sw5 | 2018-02-01 | 2 | 232.215 |
sw5 | 2017-11-01 | 3 | 1523.2576 |
sw5 | 2017-12-01 | 3 | 152.16 |
sw5 | 2018-01-01 | 3 | 152.16 |
sw5 | 2018-02-01 | 3 | 142.981 |
sw5 | 2018-03-01 | 3 | 142.981 |
sw5 | 2018-04-01 | 3 | 142.981 |
sw5 | 2018-05-01 | 3 | 142.981 |
sw5 | 2018-06-01 | 3 | 142.981 |
sw5 | 2018-07-01 | 3 | 142.1361 |
sw59 | 2018-02-01 | 2 | 1182.19 |
sw59 | 2018-03-01 | 2 | 821.21 |
Tried SOLUTION:
Created Calender table using below procedue:
DELIMITER | CREATE PROCEDURE test.fill_calendar(`start_date` DATE, `end_date` DATE) BEGIN DECLARE `crt_date` DATE; SET `crt_date`=start_date; WHILE `crt_date` < `end_date` DO INSERT INTO calendar VALUES(`crt_date`); SET `crt_date` = ADDDATE(`crt_date`, INTERVAL 1 MONTH); END WHILE; END | DELIMITER ;
Calling Procedure
CALL test.fill_calendar('2017-07-01', '2018-07-01');
Query tried to missing dates but its failing to get last_values
select postcode,last_value(postcode) over (order by datefield) from ( SELECT indicator_data.postcode,calendar.datefield AS date,indicator_data.indicator_cat,indicator_data.measure FROM indicator_data RIGHT join calendar ON (DATE(indicator_data.month_ts) = calendar.datefield) WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(month_ts)) FROM indicator_data) AND (SELECT MAX(DATE(month_ts)) FROM indicator_data)) GROUP BY date) as test;
Advertisement
Answer
You can create the calendar table’s resultset with a recursive CTE.
Then do a LEFT
join of the calendar to the table and use SUM()
window function to create groups of rows with null
in measure
.
Finally use MAX()
window function to pick the last non-null measure
:
WITH RECURSIVE dates AS ( SELECT postcode, indicator_cat, MIN(month_ts) month_ts, MAX(month_ts) max_month_ts FROM indicator_data GROUP BY postcode, indicator_cat UNION ALL SELECT postcode, indicator_cat, month_ts + INTERVAL 1 MONTH, max_month_ts FROM dates WHERE month_ts + INTERVAL 1 MONTH <= max_month_ts ), cte AS ( SELECT d.postcode, d.month_ts, d.indicator_cat, i.measure, SUM(i.measure IS NOT NULL) OVER (PARTITION BY d.postcode, d.indicator_cat ORDER BY d.month_ts) grp FROM dates d LEFT JOIN indicator_data i ON (i.postcode, i.indicator_cat, i.month_ts) = (d.postcode, d.indicator_cat, d.month_ts) ) SELECT postcode, month_ts, indicator_cat, MAX(measure) OVER (PARTITION BY postcode, indicator_cat, grp) measure FROM cte ORDER BY postcode, indicator_cat, month_ts;
See the demo.