How can I extract all text between single full stops (i.e. ignore ...
) with a re2 regex?
I am using the REGEXP_EXTRACT_ALL
function in BigQuery which uses https://github.com/google/re2/wiki/Syntax.
From the following example:
This is... a.. sentence. It is just an example.
I would like the query to extract
This is... a.. sentence.
and It is just an example.
I am specifically interested in whether this is possible using SQL functions in BigQuery rather than introducing another tool
Advertisement
Answer
Consider below workaround
select text, regexp_replace(sentence, r'(#)(.+)(#)', r'2') sentence from `project.dataset.table`, unnest(split(trim(regexp_replace(regexp_replace(text, r'(.+)', r'#1#'), r'(#.#)', r'####'), '####'), '####')) sentence
if applied to sample data in your question – output is