Following-up on my question here. Say I have a table in an Oracle database like the one below (table_1) which tracks service involvement for a particular individual:
name day srvc_ inv bill 1 1 bill 2 1 bill 3 0 bill 4 0 bill 5 1 bill 6 0 susy 1 1 susy 2 0 susy 3 1 susy 4 0 susy 5 1
My goal is to get a summary table which lists, for all unique individuals, whether there was service involvement and the number of distinct service episodes (in this case 2 for bill and 3 for susy), where a distinct service episode is identified by a break in activity over days.
To get any service involvement, I would use the following query
SELECT table_1."Name", MAX(table_1."Name") AS "any_invl" FROM table_1 GROUP BY table_1."Name"
However, I’m stuck as to how I would get the number of service involvements (2). Using a static dataframe in R, you would use run length encoding (see my original question), but I don’t know how I could accomplish this in SQL. This operation would be run over a large number of records so it would be impractical to store the entire data frame as an object and then run it in R.
Edit: My expect output would be as follows:
name any_invl n_srvc_inv bill 1 2 susy 1 3
Thanks for any help!
Advertisement
Answer
I would suggest using lag()
. The idea is to count a “1”, but only when the preceding value is zero or null
:
select name, count(*) from (select t.*, lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv from t ) t where (prev_srvc_inv is null or prev_srvc_inv = 0) and srvc_inv = 1 group by name;
You can simplify this a little by using a default value for lag()
:
select name, count(*) from (select t.*, lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv from t ) t where prev_srvc_inv = 0 and srvc_inv = 1 group by name;