Skip to content
Advertisement

Track customer loss in redshift

I have a data base that looks like this. It has:

acct_open_dt, or the day the account was opened.

acct_close_dt, or the day the account was closed.

dayssinceopendt, which is TODAY – acct_open_dt if acct_close_dt is null or acct_close_dtacct_open_dt if acct_close_dt is not null.

enter image description here

What I want:

for a day sequence along 1990-01-01 to TODAY I want to find how many customers were active for each day.

My intial thought would be to do

case when
dateadd(day,dayssinceopenddt,acct_open_date) > '1990-01-01'
then 1
else 0 end as here_on_19900101

for ALL the days, then sum up each new column, but that would be super manual and super inefficient.

How could I do this better?

(of course I was planning on changing the class from numeric to date, you don’t have to include that in your answer.)

Advertisement

Answer

This may suffice:

with t as (
      select acct_open_dt, count(*) as ins, 0 as outs
      from t
      group by acct_open_dt
      union all
      select acct_close_dt, 0 as ins, count(*) as outs
      from t
      group by acct_close_dt
     )
select dt,
       (sum(sum(ins)) over (order by dt) -
        sum(sum(outs)) over (order by dt)
       ) as actives_on_day
from t
group by dt;

This will return only dates that are in your data, but that might be sufficient.

Note that the end date is not considered an active date in this logic.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement