Skip to content
Advertisement

Count of difference in values by date

I have a data set that contain two columns [date, cust_id].

date         cust_id
2019-12-08   123
2019-12-08   321
2019-12-09   123
2019-12-09   456

There is a high churn rate for my customers and I am trying to create two additional columns [new_cust, left_cust] by counting the numbers of cust_id that are new and have left by day respectively.

In the case I have two tables broken out by day, I have no issues by querying:

count of new customers

SELECT DISTINCT cust_id
FROM 2019-12-09
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-08)

count of customers who churned

SELECT DISTINCT cust_id
FROM 2019-12-08
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-09)

I’m not sure how I would query a single table and compare these values by date. What would be the best approach to getting the correct results? I am using AWS Athena.

Expected results:

date        new_cust cust_left
2019-12-08  2        0
2019-12-09  1        1

Explanation: Assuming 2019-12-08 is the very first date, I have 2 new customers and 0 customers who have churned. 2019-12-09, I have gained 1 new customer “456”, but have 1 customer “321” who has churned. I would have to apply this to a longer range of dates and cust_id.

Advertisement

Answer

Hmmm. I think you want:

select date,
       sum(case when prev_date is null then 1 else 0 end) as new_cust,
       sum(case when next_date = date + interval '1' day then 0 else 1 end) as left_cust
from (select t.*,
             lag(date) over (partition by cust_id order by date) as prev_date,
             lead(date) over (partition by cust_id order by date) as next_date
      from t
     ) t
group by date;
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement