Skip to content
Advertisement

Regular Expression to extract string after seeing “number + one letter + [comma or whitespace]” in Bigquery

I am trying to extract:

Abbey Grove
Abbey Grove
Abbey Road View
Abbey Road
Abbey Terrace
Abbey Wood Road
Abbey Grove

from

23a, Abbey Grove
43a Abbey Grove
Block 509a Abbey Road View
511 Abbey Road
Flat 8a, Abbey Terrace
14 Abbey Wood Road
100 Abbey Grove

in Google Bigquery. The issue is that:

regexp_replace(text, '[^a-zA-Z]', '')

gives me “aabbeywood” with two a’s. Essentially I just want to keep all the text after a “numeric” or “numeric plus one letter” string.

Advertisement

Answer

SELECT regexp_replace(t, '.*[0-9]+[a-zA-Z]?[^a-zA-Z]*', '') FROM UNNEST(['23a, Abbey Grove','43a Abbey Grove','Block 509a Abbey Road View','511 Abbey Road','Flat 8a, Abbey Terrace','14 Abbey Wood Road','100 Abbey Grove']) t

I tried to reproduce the problem with your data. For this specific data it worked in BigQuery.

This regex can be translated as:

  1. Search for any characters zero or more times
  2. Search for numbers one or more times
  3. Search for zero or one letters between a and z (lower or upper)
  4. Search for any character that is not a letter zero or more times

You you have some different cases where this regex doesnt apply, please let me know. I hope it helps

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement