I have a table:
id | start_date | end_date ---------------------------- 01 | 2016-02-19 | 2017-03-02 02 | 2017-06-19 | 2018-09-11 03 | 2015-03-19 | 2018-05-02 04 | 2018-02-19 | 2018-01-05 05 | 2014-06-19 | 2018-07-25
and I would like to repeat rows based on the time between start_date and end_date, in this case by years extracted from those two date columns. My desired result would resemble:
id | year ========= 01 | 2016 01 | 2017 02 | 2017 02 | 2018 03 | 2015 03 | 2016 03 | 2017 03 | 2018 04 | 2018 05 | 2014 05 | 2015 05 | 2016 05 | 2017 05 | 2018
How can I achieve this in Redshift?
Advertisement
Answer
We can try joining with a calendar table containing all years which would appear in your table:
WITH years AS (
    SELECT 2014 AS year UNION ALL
    SELECT 2015 UNION ALL
    SELECT 2016 UNION ALL
    SELECT 2017 UNION ALL
    SELECT 2018
)
SELECT
    t2.id,
    t1.year
FROM years t1
INNER JOIN yourTable t2
    ON t1.year BETWEEN DATE_PART('year', t2.start_date) AND DATE_PART('year', t2.end_date)
ORDER BY
    t2.id,
    t1.year;
Note: Use DATE_PART(year, t2.start_date) for Redshift, where the datetime component does not take single quotes.
