Skip to content
Advertisement

Deleting Duplicating Rows That Have Two Different Case Values Per User ID

I need to get a single row per user ID and am experiencing a problem in which there was double entry of counties when there should have been one. For instance, “Palm Beach” and “PALM BEACH COUNTY” recorded for a single user. This is causing my query to return two identical rows for a single user with the difference being in the county column.

The output currently looks like the following (this is a super simplified version only showing two columns):

+---------+-------------------+
| User ID |      County       |
+---------+-------------------+
|       1 | Bay               |
|       2 | Palm Beach        |
|       2 | PALM BEACH COUNTY |
|       3 | Collin            |
|       3 | COLLIN COUNTY     |
|       4 | Colusa County     |
|       5 | Fresno            |
+---------+-------------------+

The output should not include the word “County” and should all be capitalized. The output should look like the following:

+---------+------------+
| User ID |   County   |
+---------+------------+
|       1 | BAY        |
|       2 | PALM BEACH |
|       3 | COLLIN     |
|       4 | COLUSA     |
|       5 | FRESNO     |
+---------+------------+

So far I’ve been using

UPPER(REPLACE(addr.COUNTY,'COUNTY','')) AS COUNTY

and then manually deleting the extra fields in the excel sheet. Problem is, that there are so many, this has become really inefficient.

There are single policies that do have the word county so I have tried to exclude values that have “County” in it, but then this takes away those users that only have a single record.

Any ideas as to how to get around this more efficiently?

Advertisement

Answer

You could use REPLACE and GROUP BY to get your desired result:

with sample_data as
(
    select 1 user_id, 'Bay' county from dual
    union all
    select 2 user_id, 'Palm Beach'  from dual
    union all
    select 2 user_id, 'PALM BEACH COUNTY' county from dual
    union all
    select 3 user_id, 'Collin' county from dual
    union all
    select 3 user_id, 'COLLIN COUNTY' county from dual
    union all
    select 4 user_id, 'Colusa County' county from dual
    union all
    select 5 user_id, 'Fresno' county from dual
)
select user_id, 
       upper(replace(lower(county), ' county', '')) county 
from
    sample_data
group by
    user_id,
    upper(replace(lower(county), ' county', ''))
order by
    user_id;

   USER_ID COUNTY           
---------- -----------------
         1 BAY              
         2 PALM BEACH       
         3 COLLIN           
         4 COLUSA           
         5 FRESNO  

Now you have the SQL which gives you the desired result. So, instead of deleting from the source table, simply create a new table which would be much faster than deleting them:

CREATE new_table AS
select user_id, 
   upper(replace(lower(county), ' county', '')) county 
from
    sample_data
group by
    user_id,
    upper(replace(lower(county), ' county', ''));
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement