I’m currently trying to aggregate two values simultaneously in one select statement; however, the second aggregated value is decreasing for some reason. I know what I’m doing is wrong, but I don’t understand why it’s wrong (assuming it’s the very last code block). Mainly just trying to better understand what’s going on, and why it’s happening.
I already have a corrected query that works (at the bottom)
Note: Query and outputs are simplified, please ignore any syntax issues. Additionally, in real query, I need to keep subscription_start_date
field in until the end.
Query with issue (very last block):
WITH max_product_user_count AS (
-- The total count is obtained when "days" = 0
SELECT
subscription_start_date,
datediff('days', subscription_start_date, subscription_date) AS days,
product,
num_users AS total_user_count
FROM users
WHERE days = 0
),
daily_product_user_count AS (
-- As "days" go up, the number of subscribers for each start date/product type decreases
SELECT
subscription_start_date,
datediff('days', subscription_start_date, subscription_date) AS days,
product,
num_users AS daily_user_count
FROM users
WHERE days IN (0,5,14,21,30,33,60)
)
-- Trying to aggregate by product and day, across all subscription start dates
SELECT
d.product,
d.days,
SUM(daily_user_count) AS daily_count,
SUM(total_user_count) AS total_count
FROM daily_product_user_count d
INNER JOIN max_product_user_count m ON d.subscription_start_date = m.subscription_start_date
AND d.product = m.product
GROUP BY 1,2
ORDER BY 1,2
Current Output:
PRODUCT DAYS DAILY_COUNT TOTAL_COUNT
product_1 0 10000 10000
product_1 5 99231 99781
product_1 14 96124 98123
product_1 21 85123 96441
product_1 30 23412 94142
product_1 33 12931 92111
product_1 60 10231 90123
Expected Output:
PRODUCT DAYS DAILY_COUNT TOTAL_COUNT
product_1 0 10000 10000
product_1 5 99231 10000
product_1 14 96124 10000
product_1 21 85123 10000
product_1 30 23412 10000
product_1 33 12931 10000
product_1 60 10231 10000
Updated correct query:
WITH max_product_user_count AS (
SELECT
subscription_start_date,
datediff('days', subscription_start_date, subscription_date) AS days,
product,
num_users AS total_user_count
FROM users
WHERE days = 0
),
max_user_count_aggregation AS (
SELECT
product,
SUM(total_user_count) AS total_count
FROM max_product_user_count
GROUP BY 1
),
daily_product_user_count AS (
SELECT
subscription_start_date,
datediff('days', subscription_start_date, subscription_date) AS days,
product,
num_users AS daily_user_count
FROM users
WHERE days IN (0,5,14,21,30,33,60)
)
daily_user_count_aggregation AS (
SELECT
product,
days,
SUM(daily_user_count) AS daily_count
FROM daily_product_user_count
GROUP BY 1
)
SELECT
d.product,
d.days,
daily_count,
total_count
FROM daily_user_count_aggregation d
INNER JOIN max_user_count_aggregation m ON d.product = m.product
ORDER BY 1,2
Advertisement
Answer
If I understand what you are trying to do, the query is way more complicated than necessary. I think this does what you want:
SELECT datediff('days', subscription_start_date, subscription_date) AS days,
product,
SUM(num_users) FILTER (WHERE days IN (0, 5, 14, 21, 30, 33, 60)) AS daily_user_count,
SUM(num_users) FILTER (WHERE days = 0) AS total_user_count
FROM users
GROUP BY days, product;
I would advise you to ask a new question, explaining the logic you want to implement and providing reasonable sample data and desired results.