How to count number of non-consecutive values in a column using SQL?

Following-up on my question here. Say I have a table in an Oracle database like the one below (table_1) which tracks service involvement for a particular individual:

name  day  srvc_ inv
bill  1  1
bill  2  1
bill  3  0
bill  4  0
bill  5  1
bill  6  0
susy  1  1
susy  2  0
susy  3  1
susy  4  0
susy  5  1

​x
 
name  day  srvc_ invbill  1  1bill  2  1bill  3  0bill  4  0bill  5  1bill  6  0susy  1  1susy  2  0susy  3  1susy  4  0susy  5  1​

My goal is to get a summary table which lists, for all unique individuals, whether there was service involvement and the number of distinct service episodes (in this case 2 for bill and 3 for susy), where a distinct service episode is identified by a break in activity over days.

To get any service involvement, I would use the following query

SELECT table_1."Name", MAX(table_1."Name") AS "any_invl"
FROM table_1
GROUP BY table_1."Name"

 
SELECT table_1."Name", MAX(table_1."Name") AS "any_invl"FROM table_1GROUP BY table_1."Name"​

However, I’m stuck as to how I would get the number of service involvements (2). Using a static dataframe in R, you would use run length encoding (see my original question), but I don’t know how I could accomplish this in SQL. This operation would be run over a large number of records so it would be impractical to store the entire data frame as an object and then run it in R.

Edit: My expect output would be as follows:

name  any_invl  n_srvc_inv
bill  1  2
susy  1  3

 
name  any_invl  n_srvc_invbill  1  2susy  1  3​

Thanks for any help!

Answer

I would suggest using lag(). The idea is to count a “1”, but only when the preceding value is zero or null:

select name, count(*)
from (select t.*,
             lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv
      from t
     ) t
where (prev_srvc_inv is null or prev_srvc_inv = 0) and
      srvc_inv = 1
group by name;

 
select name, count(*)from (select t.*,             lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv      from t     ) twhere (prev_srvc_inv is null or prev_srvc_inv = 0) and      srvc_inv = 1group by name;​

You can simplify this a little by using a default value for lag():

select name, count(*)
from (select t.*,
             lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv
      from t
     ) t
where prev_srvc_inv = 0 and srvc_inv = 1
group by name;

 
select name, count(*)from (select t.*,             lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv      from t     ) twhere prev_srvc_inv = 0 and srvc_inv = 1group by name;​

Advertisement

Answer