I’m currently trying to aggregate two values simultaneously in one select statement; however, the second aggregated value is decreasing for some reason. I know what I’m doing is wrong, but I don’t understand why it’s wrong (assuming it’s the very last code block). Mainly just trying to better understand what’s going on, and why it’s happening.
I already have a corrected query that works (at the bottom)
Note: Query and outputs are simplified, please ignore any syntax issues. Additionally, in real query, I need to keep subscription_start_date
field in until the end.
Query with issue (very last block):
WITH max_product_user_count AS ( -- The total count is obtained when "days" = 0 SELECT subscription_start_date, datediff('days', subscription_start_date, subscription_date) AS days, product, num_users AS total_user_count FROM users WHERE days = 0 ), daily_product_user_count AS ( -- As "days" go up, the number of subscribers for each start date/product type decreases SELECT subscription_start_date, datediff('days', subscription_start_date, subscription_date) AS days, product, num_users AS daily_user_count FROM users WHERE days IN (0,5,14,21,30,33,60) ) -- Trying to aggregate by product and day, across all subscription start dates SELECT d.product, d.days, SUM(daily_user_count) AS daily_count, SUM(total_user_count) AS total_count FROM daily_product_user_count d INNER JOIN max_product_user_count m ON d.subscription_start_date = m.subscription_start_date AND d.product = m.product GROUP BY 1,2 ORDER BY 1,2
Current Output:
PRODUCT DAYS DAILY_COUNT TOTAL_COUNT product_1 0 10000 10000 product_1 5 99231 99781 product_1 14 96124 98123 product_1 21 85123 96441 product_1 30 23412 94142 product_1 33 12931 92111 product_1 60 10231 90123
Expected Output:
PRODUCT DAYS DAILY_COUNT TOTAL_COUNT product_1 0 10000 10000 product_1 5 99231 10000 product_1 14 96124 10000 product_1 21 85123 10000 product_1 30 23412 10000 product_1 33 12931 10000 product_1 60 10231 10000
Updated correct query:
WITH max_product_user_count AS ( SELECT subscription_start_date, datediff('days', subscription_start_date, subscription_date) AS days, product, num_users AS total_user_count FROM users WHERE days = 0 ), max_user_count_aggregation AS ( SELECT product, SUM(total_user_count) AS total_count FROM max_product_user_count GROUP BY 1 ), daily_product_user_count AS ( SELECT subscription_start_date, datediff('days', subscription_start_date, subscription_date) AS days, product, num_users AS daily_user_count FROM users WHERE days IN (0,5,14,21,30,33,60) ) daily_user_count_aggregation AS ( SELECT product, days, SUM(daily_user_count) AS daily_count FROM daily_product_user_count GROUP BY 1 ) SELECT d.product, d.days, daily_count, total_count FROM daily_user_count_aggregation d INNER JOIN max_user_count_aggregation m ON d.product = m.product ORDER BY 1,2
Advertisement
Answer
If I understand what you are trying to do, the query is way more complicated than necessary. I think this does what you want:
SELECT datediff('days', subscription_start_date, subscription_date) AS days, product, SUM(num_users) FILTER (WHERE days IN (0, 5, 14, 21, 30, 33, 60)) AS daily_user_count, SUM(num_users) FILTER (WHERE days = 0) AS total_user_count FROM users GROUP BY days, product;
I would advise you to ask a new question, explaining the logic you want to implement and providing reasonable sample data and desired results.