I really appreciate all of you for your help.
We are in the process of standardizing the addresses of our customers.
I have a main customer table which contains the customers data, the address field to be standardized.
I have a mapping table that contains mapping for from_str to to_str values.
I need to get the to_str that is correspond to the from_str into the address data.
If the length of the address is more than 35 characters after standardizing, then from previous space(‘ ‘) from right side to end of the string, is created as a seperate field address2
Could you please help with sql or pl/sql? Using Oracle 12c database.
The code I wrote below is not working for all from_str values…only working for first 2 rows.
Any help is appreciated.
code so far:
with addresses as ( select cust_id,address addr from ( select 10 cust_id,'9 Help Street, Level 4' address from dual union all select 11 cust_id,'22 Victoria Street' address from dual union all select 12 cust_id,'1495 Franklin Str' address from dual union all select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all select 14 cust_id,'2 Jakaranda St' address from dual union all select 15 cust_id,'61, Science Park Rd' address from dual union all select 16 cust_id,'61, Social park road' address from dual union all select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all select 19 cust_id,'8000 W FLORISSANT AVE' address from dual union all select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW' address from dual ) t1 ), replacements as ( select id,to_str,from_string from_str from ( select 1 id,'St' to_str,'Street' from_string from dual union all select 2 id,'St' to_str,'St.' from_string from dual union all select 3 id,'St' to_str,'Str' from_string from dual union all select 4 id,'St' to_str,'St' from_string from dual union all select 5 id,'Rd' to_str,'Rd.' from_string from dual union all select 6 id,'Rd' to_str,'road' from_string from dual union all select 7 id,'Av' to_str,'Av.' from_string from dual union all select 8 id,'Av' to_str,'Ave.' from_string from dual union all select 9 id,'Av' to_str,'Avenue' from_string from dual union all select 10 id,'Av' to_str,'Aven.' from_string from dual union all select 11 id,'West' to_str,'W' from_string from dual union all select 12 id,'South West' to_str,'SW.' from_string from dual ) t2 ), r(addr,test_addr,l) as ( select addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, id - 1 from addresses, replacements where id = (select count(*) from replacements) union all select addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, l - 1 from r, replacements where id = l ) select addr,test_addr,l from r where l=0 ;
Expected Output:
cust_id address 10 9 Help St, Level 4 11 22 Victoria St 12 1495 Franklin St 13 30 Hasivim St ,Petah-Tikva 14 2 Jakaranda St 15 61, Science Park Rd 16 61, Social park Rd 17 Av Hermanos Escobar 5756 18 Av Hermanos Escobar 5756 19 8000 West FLORISSANT Ave 20 8600 MEMORIAL PKWY South West
if length of the address is more than 35 characters then the expected output is:
cust_id address address2 21 8200 FLORISSANTMEMORIALWAYABOVE South West 22 8600 MEMORIALFLORISSANT PKWY South West
Advertisement
Answer
That’s an… interesting way to apply multiple replace operations. So – you have 2 issues, as you mentioned. For the first one, the recursive part of your CTE is doing regexp_replace()
on addr
instead of test_addr
(the modified output of the previous recursive step). So only the last rule in the list is ever applied.
r(addr,test_addr, l) as ( select addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, id - 1 from addresses, replacements where id = (select count(*) from replacements) union all -- if you do regexp_replace on addr, it throws out the previous replace (which is in r.test_addr) select addr,regexp_replace(test_addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, l - 1 from r, replacements where id = l )
For the “over-35-characters” issue, I would suggest using substr/instr – while awkward to read, they’re usually fast.
select addr,test_addr,l, case when length(test_addr) > 35 then substr(test_addr, 1, instr(substr(test_addr,1,35), ' ', -1)) else test_addr end as addr1, case when length(test_addr) > 35 then substr(test_addr, instr(substr(test_addr,1,35), ' ', -1)) else null end as addr2 from r where l=0 ;
There’s probably a more graceful way to do that part, it’s just the first thing that came to mind.