Skip to content
Advertisement

How to select cumulative counts by group over time across in postgres

I have cumulative counts for two groups over time in this format:

Date Group Cumulative Count
1/1/2020 A 1
1/2/2020 A 3
1/2/2020 B 1
1/3/2020 B 2

And I’d like to reshape this data into this format:

Date Group Cumulative Count
1/1/2020 A 1
1/1/2020 B 0
1/2/2020 A 3
1/2/2020 B 1
1/3/2020 A 3
1/3/2020 B 2

So that I can get it to display accurately in a stacked area chart in metabase – any advice?

Advertisement

Answer

You may generate all possible pairs of dates and groups with a cross join before using a left join to get the combined dataset eg

Since your dataset already has the cumulative counts, the missing values identified by null have been replaced using the most recent cumulative count with MAX and COALESCE.

SELECT
    d."Date"::text,
    d."Group",
    COALESCE(m."CumulativeCount",COALESCE(MAX(m."CumulativeCount") OVER (
        PARTITION BY d."Group"
        ORDER BY d."Date"
    ),0)) as CumulativeCount
FROM (
    SELECT "Date", "Group" FROM (
        SELECT DISTINCT
            "Date"
        FROM
            my_data
    ) t1
    CROSS JOIN (
        SELECT DISTINCT
            "Group"
        FROM
            my_data
    ) t2
) d 
LEFT JOIN my_data m ON m."Date"=d."Date" AND
                       m."Group" = d."Group"
ORDER BY 1,2;
Date Group cumulativecount
2020-01-01 A 1
2020-01-01 B 0
2020-01-02 A 3
2020-01-02 B 1
2020-01-03 A 3
2020-01-03 B 2

View working demo on DB Fiddle

Update 1

If it is that you would like to generate values for dates between missing dates eg you had the next date to be 1/7/2020 and you wanted to fill the gaps for 1/3/2020 you could use generate_series to generate the possible dates and MAX to get the most recent value. I have included a fiddle below with additional sample data eg

Schema (PostgreSQL v13)

CREATE TABLE my_data (
  "Date" DATE,
  "Group" VARCHAR(1),
  "CumulativeCount" INTEGER
);

INSERT INTO my_data
  ("Date", "Group", "CumulativeCount")
VALUES
  ('1/1/2020', 'A', '1'),
  ('1/2/2020', 'A', '3'),
  ('1/2/2020', 'B', '1'),
  ('1/3/2020', 'B', '2'),
  ('1/1/2020', 'C', '2'),
  ('1/7/2020', 'C', '3');

Query #1

SELECT
    d."Date"::text,
    d."Group",
    COALESCE(
        m."CumulativeCount",
        COALESCE(MAX(m."CumulativeCount") OVER (
            PARTITION BY d."Group"
            ORDER BY d."Date"
        ),0)
    ) as CumulativeCount
FROM (
    SELECT "Date", "Group" FROM (
        SELECT
            GENERATE_SERIES(
                MIN("Date"),
                MAX("Date"),
                INTERVAL '1' DAY
            ) as "Date"
        FROM
            my_data
    ) t1
    CROSS JOIN (
        SELECT DISTINCT
            "Group"
        FROM
            my_data
    ) t2
) d 
LEFT JOIN my_data m ON m."Date"=d."Date" AND
                       m."Group" = d."Group"
ORDER BY 1,2;
Date Group cumulativecount
2020-01-01 00:00:00+00 A 1
2020-01-01 00:00:00+00 B 0
2020-01-01 00:00:00+00 C 2
2020-01-02 00:00:00+00 A 3
2020-01-02 00:00:00+00 B 1
2020-01-02 00:00:00+00 C 2
2020-01-03 00:00:00+00 A 3
2020-01-03 00:00:00+00 B 2
2020-01-03 00:00:00+00 C 2
2020-01-04 00:00:00+00 A 3
2020-01-04 00:00:00+00 B 2
2020-01-04 00:00:00+00 C 2
2020-01-05 00:00:00+00 A 3
2020-01-05 00:00:00+00 B 2
2020-01-05 00:00:00+00 C 2
2020-01-06 00:00:00+00 A 3
2020-01-06 00:00:00+00 B 2
2020-01-06 00:00:00+00 C 2
2020-01-07 00:00:00+00 A 3
2020-01-07 00:00:00+00 B 2
2020-01-07 00:00:00+00 C 3

View working demo on DB Fiddle

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement