I am trying to extract:
Abbey Grove
Abbey Grove
Abbey Road View
Abbey Road
Abbey Terrace
Abbey Wood Road
Abbey Grove
from
23a, Abbey Grove 43a Abbey Grove Block 509a Abbey Road View 511 Abbey Road Flat 8a, Abbey Terrace 14 Abbey Wood Road 100 Abbey Grove
in Google Bigquery. The issue is that:
regexp_replace(text, '[^a-zA-Z]', '')
gives me “aabbeywood” with two a’s. Essentially I just want to keep all the text after a “numeric” or “numeric plus one letter” string.
Advertisement
Answer
SELECT regexp_replace(t, '.*[0-9]+[a-zA-Z]?[^a-zA-Z]*', '') FROM UNNEST(['23a, Abbey Grove','43a Abbey Grove','Block 509a Abbey Road View','511 Abbey Road','Flat 8a, Abbey Terrace','14 Abbey Wood Road','100 Abbey Grove']) t
I tried to reproduce the problem with your data. For this specific data it worked in BigQuery
.
This regex can be translated as:
- Search for any characters zero or more times
- Search for numbers one or more times
- Search for zero or one letters between a and z (lower or upper)
- Search for any character that is not a letter zero or more times
You you have some different cases where this regex doesnt apply, please let me know. I hope it helps