I need to get a single row per user ID and am experiencing a problem in which there was double entry of counties when there should have been one. For instance, “Palm Beach” and “PALM BEACH COUNTY” recorded for a single user. This is causing my query to return two identical rows for a single user with the difference being in the county column.
The output currently looks like the following (this is a super simplified version only showing two columns):
+---------+-------------------+ | User ID | County | +---------+-------------------+ | 1 | Bay | | 2 | Palm Beach | | 2 | PALM BEACH COUNTY | | 3 | Collin | | 3 | COLLIN COUNTY | | 4 | Colusa County | | 5 | Fresno | +---------+-------------------+
The output should not include the word “County” and should all be capitalized. The output should look like the following:
+---------+------------+ | User ID | County | +---------+------------+ | 1 | BAY | | 2 | PALM BEACH | | 3 | COLLIN | | 4 | COLUSA | | 5 | FRESNO | +---------+------------+
So far I’ve been using
UPPER(REPLACE(addr.COUNTY,'COUNTY','')) AS COUNTY
and then manually deleting the extra fields in the excel sheet. Problem is, that there are so many, this has become really inefficient.
There are single policies that do have the word county so I have tried to exclude values that have “County” in it, but then this takes away those users that only have a single record.
Any ideas as to how to get around this more efficiently?
Advertisement
Answer
You could use REPLACE and GROUP BY to get your desired result:
with sample_data as ( select 1 user_id, 'Bay' county from dual union all select 2 user_id, 'Palm Beach' from dual union all select 2 user_id, 'PALM BEACH COUNTY' county from dual union all select 3 user_id, 'Collin' county from dual union all select 3 user_id, 'COLLIN COUNTY' county from dual union all select 4 user_id, 'Colusa County' county from dual union all select 5 user_id, 'Fresno' county from dual ) select user_id, upper(replace(lower(county), ' county', '')) county from sample_data group by user_id, upper(replace(lower(county), ' county', '')) order by user_id; USER_ID COUNTY ---------- ----------------- 1 BAY 2 PALM BEACH 3 COLLIN 4 COLUSA 5 FRESNO
Now you have the SQL which gives you the desired result. So, instead of deleting from the source table, simply create a new table which would be much faster than deleting them:
CREATE new_table AS select user_id, upper(replace(lower(county), ' county', '')) county from sample_data group by user_id, upper(replace(lower(county), ' county', ''));