Skip to content
Advertisement

Aggregating two values in same select statement. Second aggregation is decreasing in value for each row for some reason

I’m currently trying to aggregate two values simultaneously in one select statement; however, the second aggregated value is decreasing for some reason. I know what I’m doing is wrong, but I don’t understand why it’s wrong (assuming it’s the very last code block). Mainly just trying to better understand what’s going on, and why it’s happening.

I already have a corrected query that works (at the bottom)

Note: Query and outputs are simplified, please ignore any syntax issues. Additionally, in real query, I need to keep subscription_start_date field in until the end.

Query with issue (very last block):

WITH max_product_user_count AS (
-- The total count is obtained when "days" = 0
SELECT
    subscription_start_date,
    datediff('days', subscription_start_date, subscription_date) AS days,
    product,
    num_users AS total_user_count
FROM users
WHERE days = 0
),

daily_product_user_count AS (
-- As "days" go up, the number of subscribers for each start date/product type decreases
SELECT
    subscription_start_date,
    datediff('days', subscription_start_date, subscription_date) AS days,
    product,
    num_users AS daily_user_count
FROM users
WHERE days IN (0,5,14,21,30,33,60)
) 

-- Trying to aggregate by product and day, across all subscription start dates
SELECT
    d.product,
    d.days,
    SUM(daily_user_count) AS daily_count,
    SUM(total_user_count) AS total_count
FROM daily_product_user_count d 
INNER JOIN max_product_user_count m ON d.subscription_start_date = m.subscription_start_date
AND d.product = m.product
GROUP BY 1,2
ORDER BY 1,2

Current Output:

PRODUCT       DAYS  DAILY_COUNT TOTAL_COUNT
product_1     0     10000       10000
product_1     5     99231       99781
product_1     14    96124       98123
product_1     21    85123       96441
product_1     30    23412       94142
product_1     33    12931       92111
product_1     60    10231       90123

Expected Output:

PRODUCT       DAYS  DAILY_COUNT TOTAL_COUNT
product_1     0     10000       10000
product_1     5     99231       10000
product_1     14    96124       10000
product_1     21    85123       10000
product_1     30    23412       10000
product_1     33    12931       10000
product_1     60    10231       10000

Updated correct query:

WITH max_product_user_count AS (

SELECT
    subscription_start_date,
    datediff('days', subscription_start_date, subscription_date) AS days,
    product,
    num_users AS total_user_count
FROM users
WHERE days = 0
),

max_user_count_aggregation AS (

SELECT
    product,
    SUM(total_user_count) AS total_count 
FROM max_product_user_count
GROUP BY  1
),

daily_product_user_count AS (

SELECT
    subscription_start_date,
    datediff('days', subscription_start_date, subscription_date) AS days,
    product,
    num_users AS daily_user_count
FROM users
WHERE days IN (0,5,14,21,30,33,60)
) 

daily_user_count_aggregation AS (

SELECT
    product,
    days,
    SUM(daily_user_count) AS daily_count 
FROM daily_product_user_count
GROUP BY  1
)

SELECT
    d.product,
    d.days,
    daily_count,
    total_count
FROM daily_user_count_aggregation d 
INNER JOIN max_user_count_aggregation m ON d.product = m.product
ORDER BY 1,2

Advertisement

Answer

If I understand what you are trying to do, the query is way more complicated than necessary. I think this does what you want:

SELECT datediff('days', subscription_start_date, subscription_date) AS days,
       product,
       SUM(num_users) FILTER (WHERE days IN (0, 5, 14, 21, 30, 33, 60)) AS daily_user_count,
       SUM(num_users) FILTER (WHERE days = 0) AS total_user_count
FROM users
GROUP BY days, product;

I would advise you to ask a new question, explaining the logic you want to implement and providing reasonable sample data and desired results.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement