Imagine we have the following data:
ID,State,start_date,end_date,Product S1,Trial,2020/01/01,2020/01/07,Hulu S1,Paid,2020/01/08,2020/01/31,Hulu S1,Expired,2020/02/01,null,Hulu S1,Paid,2020/03/01,2020/03/30,Hulu S2,Paid,2020/01/08,2020/01/31,Hulu S3,Paid,2020/01/09,2020/02/01,Hulu create table test ( ID varchar(10), State varchar(10), start_date datetime, end_date datetime, Product varchar(10) ); insert into test VALUES ('S1','Trial','2020-01-01','2020-01-07','Hulu'), ('S1','Paid','2020-01-08','2020-01-31','Hulu'), ('S1','Expired','2020-02-01',null,'Hulu'), ('S1','Paid','2020-03-01','2020-03-30','Hulu'), ('S2','Paid','2020-01-08','2020-01-31','Hulu'), ('S3','Paid','2020-01-09','2020-02-01','Hulu') ;
The question here is to get the 2020 monthly trend of active paid subscriptions. For each subscriber (ID), we can only count the months that they are active. So for S1, we can only count S1 active in Jan 2020 and March 2020, not Feb 2020.
During the interview, I wrote a function and said we can loop over to call this function for each month in 2020
def month_active_sub($yyyymm): select $yyyymm as month, count(distinct ID) from table where end_date >= $yyyymm and start_date <= $yyyymm and state='paid';
or
select '202001' as month, count(distinct ID) from table where end_date >= '202001' and start_date <= '202001' and state='paid' union all select '202002' as month, count(distinct ID) from table where end_date >= '202002' and start_date <= '202002' and state='paid' union all for another 10 months
I wonder if there is any better way to write this SQL query? Thanks!
Advertisement
Answer
One method uses a table of numbers:
select '2020-01-01' + interval n.n month start_of_month, count(t.id) no_active_subscribers from ( select 0 n union all select 1 union all select 2 ... union all select 11 ) n left join mytable t on t.start_date >= '2020-01-01' + interval n.n month and t.end_date <= '2020-01-01' + interval (n.n + 1) month and t.state = 'paid' group by n.n