So I want to calculate cumulative users per day but if the users exist is previous days they will not counted.
date_key user_id 2022-01-01 001 2022-01-01 002 2022-01-02 001 2022-01-02 003 2022-01-03 002 2022-01-03 003 2022-01-04 002 2022-01-04 004
on daily basis we can get
date_key total_user 2022-01-01 2 2022-01-02 2 2022-01-03 2 2022-01-04 2
if we simply calculate cumulative we can get 2,4,6,8 for each day the goal is to get the table like this
date_key total_user 2022-01-01 2 2022-01-02 3 2022-01-03 3 2022-01-04 4
im using this query to get the result, since the data is really2 huge. the query takes forever to complete.
select b.date_key,count(distinct a.user_id) total_user from t1 a join t1 b on b.date_key >= a.date_key and date_trunc(a.date_key,month) = date_trunc(b.date_key,month) group by 1 order by 1
and yes the calculation should be on reset when the month is changing.
and btw I’m using google bigquery
Advertisement
Answer
Number each user’s appearance by order of date. Count only the ones seen for the first time:
with data as ( select *, row_number() over (partition by date_trunc(date_key, month), userid order by date_key) as rn from T ) select date_key, sum(count(case when rn = 1 then 1 end)) -- or countif(rn = 1) over (partition by date_trunc(date_key, month) order by date_key) as cum_monthly_users from data group by date_key;
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc426d79a7786fc8a5b25a22f0755e27