Imagine we have the following data:
x
ID,State,start_date,end_date,Product
S1,Trial,2020/01/01,2020/01/07,Hulu
S1,Paid,2020/01/08,2020/01/31,Hulu
S1,Expired,2020/02/01,null,Hulu
S1,Paid,2020/03/01,2020/03/30,Hulu
S2,Paid,2020/01/08,2020/01/31,Hulu
S3,Paid,2020/01/09,2020/02/01,Hulu
create table test
(
ID varchar(10),
State varchar(10),
start_date datetime,
end_date datetime,
Product varchar(10)
);
insert into test
VALUES
('S1','Trial','2020-01-01','2020-01-07','Hulu'),
('S1','Paid','2020-01-08','2020-01-31','Hulu'),
('S1','Expired','2020-02-01',null,'Hulu'),
('S1','Paid','2020-03-01','2020-03-30','Hulu'),
('S2','Paid','2020-01-08','2020-01-31','Hulu'),
('S3','Paid','2020-01-09','2020-02-01','Hulu')
;
The question here is to get the 2020 monthly trend of active paid subscriptions. For each subscriber (ID), we can only count the months that they are active. So for S1, we can only count S1 active in Jan 2020 and March 2020, not Feb 2020.
During the interview, I wrote a function and said we can loop over to call this function for each month in 2020
def month_active_sub($yyyymm):
select
$yyyymm as month,
count(distinct ID)
from table where end_date >= $yyyymm and start_date <= $yyyymm and state='paid';
or
select
'202001' as month,
count(distinct ID)
from table where end_date >= '202001' and start_date <= '202001' and state='paid'
union all
select
'202002' as month,
count(distinct ID)
from table where end_date >= '202002' and start_date <= '202002' and state='paid'
union all for another 10 months
I wonder if there is any better way to write this SQL query? Thanks!
Advertisement
Answer
One method uses a table of numbers:
select '2020-01-01' + interval n.n month start_of_month, count(t.id) no_active_subscribers
from (
select 0 n
union all select 1
union all select 2
union all select 11
) n
left join mytable t
on t.start_date >= '2020-01-01' + interval n.n month
and t.end_date <= '2020-01-01' + interval (n.n + 1) month
and t.state = 'paid'
group by n.n