How to Select ID’s in SQL (Databricks) in wh…

I’m working with patient-level data in Azure Databricks and I’m trying to build out a cohort of patients that have at least 2 diagnoses from a list of specific diagnosis codes. This is essentially what the table looks like:

CLAIM_ID | PTNT_ID | ICD_CD | DATE
---------+---------+--------+------------
1           101      2500     01_25_2020
2           101      3850     03_13_2018  
3           222      2500     10_26_2018
4           222      8888     11_30_2018
5           222      9155     04_01_2019
6           871      2500     02_17_2020
7           871      3200     09_09_2019

​x
 
CLAIM_ID | PTNT_ID | ICD_CD | DATE---------+---------+--------+------------1           101      2500     01_25_20202           101      3850     03_13_2018  3           222      2500     10_26_20184           222      8888     11_30_20185           222      9155     04_01_20196           871      2500     02_17_20207           871      3200     09_09_2019​

The list of ICD_CD codes of interest is something like [2500, 3850, 8888]. In this case, I would want to return TOTAL UNIQUE PTNT_ID = 2. These would be PTNT_ID = (101, 222) as these are the only two patients that have at least 2 ICD_CD codes of interest.

When I use something like this, I’m able to return all of the relevant PTNT_ID values, but I’m not able to get the total count of these PTNT_ID:

select mc.PTNT_ID
from MEDICAL_CLAIMS mc
where mc.PTNT_ID in ( # list of ICD_CD of interest
)
group by mc.PTNT_ID
having count(distinct mc.PTNT) >= 2

 
select mc.PTNT_IDfrom MEDICAL_CLAIMS mcwhere mc.PTNT_ID in ( # list of ICD_CD of interest)group by mc.PTNT_IDhaving count(distinct mc.PTNT) >= 2​

When I try to add a COUNT statement in, it returns an error

Answer

Just select from the query:

select count(*)
from
(
  select mc.PTNT_ID
  from MEDICAL_CLAIMS mc
  where mc.PTNT_ID in ( # list of ICD_CD of interest )
  group by mc.PTNT_ID
  having count(distinct mc.PTNT) >= 2
) ptnts;

 
select count(*)from(  select mc.PTNT_ID  from MEDICAL_CLAIMS mc  where mc.PTNT_ID in ( # list of ICD_CD of interest )  group by mc.PTNT_ID  having count(distinct mc.PTNT) >= 2) ptnts;​

How to Select ID’s in SQL (Databricks) in which at least 2 items from a list are present

Advertisement

Answer