Skip to content
Advertisement

How to carrying over values for missing dates in time series using last value windows analytical functions in mysql

How to carrying over values for missing dates postcode/indicator_category to create full monthly time series. Im trying to use last_value to carry over values but not able to make it. Is my approach correct? Any help would by highly appreciated.

Example given a table:

    create table test.indicator_data(
      postcode text, 
      month_ts date, 
      indicator_cat integer,
      measure double precision);

INSERT INTO value to indicator_data table

    INSERT INTO test.indicator_data 
    VALUES
    ('sw5', '2017-07-01', 2, 99212.231),
    ('sw5', '2018-02-01', 2, 232.215),
    ('sw5', '2017-11-01', 3, 1523.2576),
    ('sw5', '2017-12-01', 3, 152.16),
    ('sw5', '2018-02-01', 3, 142.981),
    ('sw5', '2018-07-01', 3 , 142.1361),
    ('sw5 9', '2018-03-01', 2, 821.21),
    ('sw5 9', '2018-02-01', 2, 1182.19);

INPUT:

postcode month_ts indicator_cat measure
sw5 2017-07-01 2 99212.231
sw5 2018-02-01 2 232.215
sw5 2017-11-01 3 1523.2576
sw5 2017-12-01 3 152.16
sw5 2018-02-01 3 142.981
sw5 2018-07-01 3 142.1361
sw59 2018-03-01 2 821.21
sw59 2018-02-01 2 1182.19

EXPECTED OUTPUT:

postcode month_ts indicator_cat measure
sw5 2017-07-01 2 99212.231
sw5 2017-08-01 2 99212.231
sw5 2017-09-01 2 99212.231
sw5 2017-10-01 2 99212.231
sw5 2017-11-01 2 99212.231
sw5 2017-12-01 2 99212.231
sw5 2018-01-01 2 99212.231
sw5 2018-02-01 2 232.215
sw5 2017-11-01 3 1523.2576
sw5 2017-12-01 3 152.16
sw5 2018-01-01 3 152.16
sw5 2018-02-01 3 142.981
sw5 2018-03-01 3 142.981
sw5 2018-04-01 3 142.981
sw5 2018-05-01 3 142.981
sw5 2018-06-01 3 142.981
sw5 2018-07-01 3 142.1361
sw59 2018-02-01 2 1182.19
sw59 2018-03-01 2 821.21

Tried SOLUTION:

Created Calender table using below procedue:

    DELIMITER |
    CREATE PROCEDURE test.fill_calendar(`start_date` DATE, `end_date` DATE)
    BEGIN
      DECLARE `crt_date` DATE;
      SET `crt_date`=start_date;
      WHILE `crt_date` < `end_date` DO
        INSERT INTO calendar VALUES(`crt_date`);
        SET `crt_date` = ADDDATE(`crt_date`, INTERVAL 1 MONTH);
      END WHILE;
    END |
    DELIMITER ;

Calling Procedure

CALL test.fill_calendar('2017-07-01', '2018-07-01');

Query tried to missing dates but its failing to get last_values

    select postcode,last_value(postcode) over (order by datefield) from (
    SELECT indicator_data.postcode,calendar.datefield AS date,indicator_data.indicator_cat,indicator_data.measure
    FROM indicator_data RIGHT join calendar ON (DATE(indicator_data.month_ts) = calendar.datefield)
    WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(month_ts)) FROM indicator_data) AND (SELECT MAX(DATE(month_ts)) FROM indicator_data))
    GROUP BY date) as test;

Advertisement

Answer

You can create the calendar table’s resultset with a recursive CTE.
Then do a LEFT join of the calendar to the table and use SUM() window function to create groups of rows with null in measure.
Finally use MAX() window function to pick the last non-null measure:

WITH  
  RECURSIVE dates AS (
    SELECT postcode, indicator_cat, MIN(month_ts) month_ts, MAX(month_ts) max_month_ts 
    FROM indicator_data
    GROUP BY postcode, indicator_cat
    UNION ALL
    SELECT postcode, indicator_cat, month_ts + INTERVAL 1 MONTH, max_month_ts 
    FROM dates 
    WHERE month_ts + INTERVAL 1 MONTH <= max_month_ts
  ),
  cte AS (
    SELECT d.postcode, d.month_ts, d.indicator_cat, i.measure,
           SUM(i.measure IS NOT NULL) OVER (PARTITION BY d.postcode, d.indicator_cat ORDER BY d.month_ts) grp
    FROM dates d LEFT JOIN indicator_data i
    ON (i.postcode, i.indicator_cat, i.month_ts) = (d.postcode, d.indicator_cat, d.month_ts)
  )
SELECT postcode, month_ts, indicator_cat,  
       MAX(measure) OVER (PARTITION BY postcode, indicator_cat, grp) measure   
FROM cte
ORDER BY postcode, indicator_cat, month_ts;

See the demo.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement