How to carrying over values for missing dates postcode/indicator_category to create full monthly time series. Im trying to use last_value to carry over values but not able to make it. Is my approach correct? Any help would by highly appreciated.
Example given a table:
create table test.indicator_data(
postcode text,
month_ts date,
indicator_cat integer,
measure double precision);
INSERT INTO value to indicator_data table
INSERT INTO test.indicator_data
VALUES
('sw5', '2017-07-01', 2, 99212.231),
('sw5', '2018-02-01', 2, 232.215),
('sw5', '2017-11-01', 3, 1523.2576),
('sw5', '2017-12-01', 3, 152.16),
('sw5', '2018-02-01', 3, 142.981),
('sw5', '2018-07-01', 3 , 142.1361),
('sw5 9', '2018-03-01', 2, 821.21),
('sw5 9', '2018-02-01', 2, 1182.19);
INPUT:
| postcode | month_ts | indicator_cat | measure |
|---|---|---|---|
| sw5 | 2017-07-01 | 2 | 99212.231 |
| sw5 | 2018-02-01 | 2 | 232.215 |
| sw5 | 2017-11-01 | 3 | 1523.2576 |
| sw5 | 2017-12-01 | 3 | 152.16 |
| sw5 | 2018-02-01 | 3 | 142.981 |
| sw5 | 2018-07-01 | 3 | 142.1361 |
| sw59 | 2018-03-01 | 2 | 821.21 |
| sw59 | 2018-02-01 | 2 | 1182.19 |
EXPECTED OUTPUT:
| postcode | month_ts | indicator_cat | measure |
|---|---|---|---|
| sw5 | 2017-07-01 | 2 | 99212.231 |
| sw5 | 2017-08-01 | 2 | 99212.231 |
| sw5 | 2017-09-01 | 2 | 99212.231 |
| sw5 | 2017-10-01 | 2 | 99212.231 |
| sw5 | 2017-11-01 | 2 | 99212.231 |
| sw5 | 2017-12-01 | 2 | 99212.231 |
| sw5 | 2018-01-01 | 2 | 99212.231 |
| sw5 | 2018-02-01 | 2 | 232.215 |
| sw5 | 2017-11-01 | 3 | 1523.2576 |
| sw5 | 2017-12-01 | 3 | 152.16 |
| sw5 | 2018-01-01 | 3 | 152.16 |
| sw5 | 2018-02-01 | 3 | 142.981 |
| sw5 | 2018-03-01 | 3 | 142.981 |
| sw5 | 2018-04-01 | 3 | 142.981 |
| sw5 | 2018-05-01 | 3 | 142.981 |
| sw5 | 2018-06-01 | 3 | 142.981 |
| sw5 | 2018-07-01 | 3 | 142.1361 |
| sw59 | 2018-02-01 | 2 | 1182.19 |
| sw59 | 2018-03-01 | 2 | 821.21 |
Tried SOLUTION:
Created Calender table using below procedue:
DELIMITER |
CREATE PROCEDURE test.fill_calendar(`start_date` DATE, `end_date` DATE)
BEGIN
DECLARE `crt_date` DATE;
SET `crt_date`=start_date;
WHILE `crt_date` < `end_date` DO
INSERT INTO calendar VALUES(`crt_date`);
SET `crt_date` = ADDDATE(`crt_date`, INTERVAL 1 MONTH);
END WHILE;
END |
DELIMITER ;
Calling Procedure
CALL test.fill_calendar('2017-07-01', '2018-07-01');
Query tried to missing dates but its failing to get last_values
select postcode,last_value(postcode) over (order by datefield) from (
SELECT indicator_data.postcode,calendar.datefield AS date,indicator_data.indicator_cat,indicator_data.measure
FROM indicator_data RIGHT join calendar ON (DATE(indicator_data.month_ts) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(month_ts)) FROM indicator_data) AND (SELECT MAX(DATE(month_ts)) FROM indicator_data))
GROUP BY date) as test;
Advertisement
Answer
You can create the calendar table’s resultset with a recursive CTE.
Then do a LEFT join of the calendar to the table and use SUM() window function to create groups of rows with null in measure.
Finally use MAX() window function to pick the last non-null measure:
WITH
RECURSIVE dates AS (
SELECT postcode, indicator_cat, MIN(month_ts) month_ts, MAX(month_ts) max_month_ts
FROM indicator_data
GROUP BY postcode, indicator_cat
UNION ALL
SELECT postcode, indicator_cat, month_ts + INTERVAL 1 MONTH, max_month_ts
FROM dates
WHERE month_ts + INTERVAL 1 MONTH <= max_month_ts
),
cte AS (
SELECT d.postcode, d.month_ts, d.indicator_cat, i.measure,
SUM(i.measure IS NOT NULL) OVER (PARTITION BY d.postcode, d.indicator_cat ORDER BY d.month_ts) grp
FROM dates d LEFT JOIN indicator_data i
ON (i.postcode, i.indicator_cat, i.month_ts) = (d.postcode, d.indicator_cat, d.month_ts)
)
SELECT postcode, month_ts, indicator_cat,
MAX(measure) OVER (PARTITION BY postcode, indicator_cat, grp) measure
FROM cte
ORDER BY postcode, indicator_cat, month_ts;
See the demo.