I have a database that allows for more than one ethnicity per person. Unfortunately, our answers are essentially Yes Hispanic, Not Hispanic, and Unknown, and there are some who do indeed have multiple selections. I need to run a large query that pulls lots of info, one of which is ethnicity, and I want to “convert” those that have multiple selections as Unknown.
person_ethnicity_xref table:
Person_ID | Ethnicity_ID |
---|---|
1234567 | SLWOWQ |
1234567 | ZLKJDU |
mstr_lists table:
Ethnicity_ID | Ethnicity |
---|---|
SLWOWQ | Hispanic |
ZLKJDU | Not Hispanic |
I’ve been struggling with this as I can’t use a For XML Path with two tables, so I’m now trying to use the logic of Case When count(ethnicity_ID)>1 then ‘Unknown’ Else Ethnicity End
Here’s what I have
select p.person_nbr, case when count(eth1.ethnicity_item_id)>1 then 'Unknown' else ml1.mstr_list_item_desc end 'final eth' from person_table p left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id group by p.person_nbr, ml1.mstr_list_item_desc
This gave me results but when I check them, those with >1 don’t have a value of Unknown and people are listed twice once with each ethnicity.
Another part of this larger query has a subquery in the FROM that counts race and a separate table join for only those with a count=1. Then the case says if the subquery that counts race came up with >1 then X otherwise use that other table for count=1. Because the race table also uses that mstr_list there’s then 5 tables involved (there’s a second person_id join now that I look at it more closely, and there’s a mstr_list to the count and the regular tables…I have no idea why, my brain is tired and that count table isn’t a simple count and also is doing something else). Is this really the only option? This query already takes over 10 min to run (it is not run on production!) and I’d hate to make it worse by duplicating what the previous writer did.
Advertisement
Answer
Use aggregation:
select p.person_nbr, (case when min(ml1.mstr_list_item_desc) = max(ml1.mstr_list_item_desc) then min(ml1.mstr_list_item_desc) else 'Unknown' end) as final_ethnicity from person_table p left join person_ethnicity_xref eth1 on p.person_id = eth1.person_id left join mstr_lists ml1 on eth1.ethnicity_item_id = ml1.mstr_list_item_id group by p.person_nbr;
Note: This slightly tweaks your logic. If there a multiple ethnicities and they are all the same, then that value is used.