Skip to content
Advertisement

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.

order_date |    CustomerID
2019-01-01 |    111
2019-02-01 |    112
2020-01-01 |    111
2020-02-01 |    113
2021-01-01 |    115
2021-02-01 |    119

I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect

order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191                     |250
2020-02-01| 4859                     |500
2020-03-01| 3567                     |349
..........| ....                     |......

and the next periods shouldn’t include the previous.

I tried the code below but it works in another way

with customers as (
                  select distinct date_trunc(date(order_date),month) as dates,    
                         CUSTOMER_WID
                  from t 
                  where date(order_date) between '2018-01-01' and current_date()-1
            )
            
            select 
            dates,
            customers_previous,
            customers_next_period
            from 
            (
            select dates,
            count(CUSTOMER_WID) as customers_previous,
            count(case when customer_wid_next is null then 1 end) as customers_next_period,
            from (
            select prev.dates,
            prev.CUSTOMER_WID,
            next.dates as next_dates,
            next.CUSTOMER_WID as customer_wid_next
            from customers as prev
            left join  customers
            as next on next.dates=date_add(prev.dates,interval 1 year) 
            and prev.CUSTOMER_WID=next.CUSTOMER_WID 
            ) as t2
            group by dates
            )
            order by 1,2

Thanks in advance.

Advertisement

Answer

You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:

select period,
       count(distinct c_prev.customer_wid),
       count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
     customers c_prev
     on c_prev.order_date <= period and
        c_prev.order_date > date_add(period, interval -12 month) left join
     customers c_next
     on c_next.customer_wid = c_prev.customer_wid and
        c_next.order_date > period and
        c_next.order_date <= date_add(period, interval 12 month)
group by period;
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement