SQL how can we get monthly trend from 2 separated columns start_date and end_date?

Imagine we have the following data:

ID,State,start_date,end_date,Product

S1,Trial,2020/01/01,2020/01/07,Hulu
S1,Paid,2020/01/08,2020/01/31,Hulu
S1,Expired,2020/02/01,null,Hulu
S1,Paid,2020/03/01,2020/03/30,Hulu
S2,Paid,2020/01/08,2020/01/31,Hulu
S3,Paid,2020/01/09,2020/02/01,Hulu

create table test
  (
  ID varchar(10),
  State varchar(10),
  start_date datetime,
  end_date datetime,
  Product varchar(10)
  );

 insert into test 
 VALUES 
('S1','Trial','2020-01-01','2020-01-07','Hulu'),
('S1','Paid','2020-01-08','2020-01-31','Hulu'),
('S1','Expired','2020-02-01',null,'Hulu'),
('S1','Paid','2020-03-01','2020-03-30','Hulu'),
('S2','Paid','2020-01-08','2020-01-31','Hulu'),
('S3','Paid','2020-01-09','2020-02-01','Hulu')
 ;

​x
 
ID,State,start_date,end_date,Product​S1,Trial,2020/01/01,2020/01/07,HuluS1,Paid,2020/01/08,2020/01/31,HuluS1,Expired,2020/02/01,null,HuluS1,Paid,2020/03/01,2020/03/30,HuluS2,Paid,2020/01/08,2020/01/31,HuluS3,Paid,2020/01/09,2020/02/01,Hulu​create table test  (  ID varchar(10),  State varchar(10),  start_date datetime,  end_date datetime,  Product varchar(10)  );​ insert into test  VALUES ('S1','Trial','2020-01-01','2020-01-07','Hulu'),('S1','Paid','2020-01-08','2020-01-31','Hulu'),('S1','Expired','2020-02-01',null,'Hulu'),('S1','Paid','2020-03-01','2020-03-30','Hulu'),('S2','Paid','2020-01-08','2020-01-31','Hulu'),('S3','Paid','2020-01-09','2020-02-01','Hulu') ;​

The question here is to get the 2020 monthly trend of active paid subscriptions. For each subscriber (ID), we can only count the months that they are active. So for S1, we can only count S1 active in Jan 2020 and March 2020, not Feb 2020.

During the interview, I wrote a function and said we can loop over to call this function for each month in 2020

def month_active_sub($yyyymm):
   select 
   $yyyymm as month,
   count(distinct ID)
   from table where end_date >= $yyyymm and start_date <= $yyyymm and state='paid';

 
def month_active_sub($yyyymm):   select    $yyyymm as month,   count(distinct ID)   from table where end_date >= $yyyymm and start_date <= $yyyymm and state='paid';​

   select 
   '202001' as month,
   count(distinct ID)
   from table where end_date >= '202001' and start_date <= '202001' and state='paid'

   union all 

   select 
   '202002' as month,
   count(distinct ID)
   from table where end_date >= '202002' and start_date <= '202002' and state='paid'

   union all for another 10 months

 
   select    '202001' as month,   count(distinct ID)   from table where end_date >= '202001' and start_date <= '202001' and state='paid'​   union all ​   select    '202002' as month,   count(distinct ID)   from table where end_date >= '202002' and start_date <= '202002' and state='paid'​   union all for another 10 months​

I wonder if there is any better way to write this SQL query? Thanks!

Answer

One method uses a table of numbers:

select '2020-01-01' + interval n.n month start_of_month, count(t.id) no_active_subscribers
from (
    select 0 n 
    union all select 1
    union all select 2 
    ... 
    union all select 11
) n
left join mytable t 
    on  t.start_date >= '2020-01-01' + interval n.n month
    and t.end_date   <= '2020-01-01' + interval (n.n + 1) month
    and t.state = 'paid'
group by n.n

 
select '2020-01-01' + interval n.n month start_of_month, count(t.id) no_active_subscribersfrom (    select 0 n     union all select 1    union all select 2     ...     union all select 11) nleft join mytable t     on  t.start_date >= '2020-01-01' + interval n.n month    and t.end_date   <= '2020-01-01' + interval (n.n + 1) month    and t.state = 'paid'group by n.n​

Advertisement

Answer