Skip to content
Advertisement

Standardizing addresses using regexp_replace in oracle sql

I really appreciate all of you for your help.

We are in the process of standardizing the addresses of our customers.

I have a main customer table which contains the customers data, the address field to be standardized.

I have a mapping table that contains mapping for from_str to to_str values.

I need to get the to_str that is correspond to the from_str into the address data.

If the length of the address is more than 35 characters after standardizing, then from previous space(‘ ‘) from right side to end of the string, is created as a seperate field address2

Could you please help with sql or pl/sql? Using Oracle 12c database.

The code I wrote below is not working for all from_str values…only working for first 2 rows.

Any help is appreciated.

code so far:

with addresses as
(
    select cust_id,address addr from
    (
    select 10 cust_id,'9 Help Street, Level 4' address from dual union all
    select 11 cust_id,'22 Victoria Street' address from dual union all
    select 12 cust_id,'1495 Franklin Str' address from dual union all
    select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
    select 14 cust_id,'2 Jakaranda St' address from dual union all
    select 15 cust_id,'61, Science Park Rd' address from dual union all
    select 16 cust_id,'61, Social park road' address from dual union all
    select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
    select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
    select 19 cust_id,'8000 W FLORISSANT AVE' address from dual union all
    select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
    select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
    select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW' address from dual
    ) t1
),
  replacements as
    (
    select id,to_str,from_string from_str from
    (
    select 1 id,'St' to_str,'Street' from_string from dual union all
  select 2 id,'St' to_str,'St.' from_string from dual union all
  select 3 id,'St' to_str,'Str' from_string from dual union all
  select 4 id,'St' to_str,'St' from_string from dual union all
  select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
  select 6 id,'Rd' to_str,'road' from_string from dual union all
  select 7 id,'Av' to_str,'Av.' from_string from dual union all
  select 8 id,'Av' to_str,'Ave.' from_string from dual union all
  select 9 id,'Av' to_str,'Avenue' from_string from dual union all
  select 10 id,'Av' to_str,'Aven.' from_string from dual union all
  select 11 id,'West' to_str,'W' from_string from dual union all
  select 12 id,'South West' to_str,'SW.' from_string from dual
  ) t2
),
  r(addr,test_addr,l) as 
  (
      select  addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr,
          id - 1
         from  
         addresses,
         replacements
         where id = (select count(*) from replacements)
      union all
         select addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr,
          l - 1
         from  r,
         replacements
         where id = l
   )
  select addr,test_addr,l
  from  r
  where l=0
  ;

Expected Output:

cust_id address

10      9 Help St, Level 4
11      22 Victoria St
12      1495 Franklin St
13      30 Hasivim St ,Petah-Tikva
14      2 Jakaranda St
15      61, Science Park Rd
16      61, Social park Rd
17      Av Hermanos Escobar 5756
18      Av Hermanos Escobar 5756
19      8000 West FLORISSANT Ave
20      8600 MEMORIAL PKWY South West

if length of the address is more than 35 characters then the expected output is:

cust_id address                              address2
21      8200 FLORISSANTMEMORIALWAYABOVE      South West
22      8600 MEMORIALFLORISSANT PKWY         South West

Advertisement

Answer

That’s an… interesting way to apply multiple replace operations. So – you have 2 issues, as you mentioned. For the first one, the recursive part of your CTE is doing regexp_replace() on addr instead of test_addr (the modified output of the previous recursive step). So only the last rule in the list is ever applied.

r(addr,test_addr, l) as 
(
    select  addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, 
        id - 1
       from  
       addresses,
       replacements
       where id = (select count(*) from replacements)
    union all
       -- if you do regexp_replace on addr, it throws out the previous replace (which is in r.test_addr)
       select addr,regexp_replace(test_addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, 
        l - 1
       from  r,
       replacements
       where id = l
 )

For the “over-35-characters” issue, I would suggest using substr/instr – while awkward to read, they’re usually fast.

select addr,test_addr,l, 
  case when length(test_addr) > 35 then
      substr(test_addr, 1, instr(substr(test_addr,1,35), ' ', -1))
    else test_addr
    end as addr1,
  case when length(test_addr) > 35 then
      substr(test_addr, instr(substr(test_addr,1,35), ' ', -1))
    else null
    end as addr2
from  r
where l=0
;

There’s probably a more graceful way to do that part, it’s just the first thing that came to mind.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement