REGEX Extract Amount Without Currency

Question

SELECT ocr_text, bucket, REGEXP_EXTRACT('-?[0-9]+(.[0-9]+)?', ocr_text) FROM temp I am trying to extract amounts from a string that will not have currency present. Any number that does ...

Accepted Answer

Here is a general regex pattern for a positive/negative number with two decimal places and optional thousands comma separators:(?<!S)(?:-?[0-9]{1,3}(,[0-9]{3})*(.[0-9]{2})|-?[0-9]+(.[0-9]{2}))(?!S)DemoYour updated query:SELECT    ocr_text,    bucket,    REGEXP_EXTRACT(ocr_text, '(?<!S)(?:-?[0-9]{1,3}(,[0-9]{3})*(.[0-9]{2})|-?[0-9]+(.[0-9]{2}))(?!S)')FROM temp;From the Presto docs I read, it supposedly supports Java&#8217;s regex syntax.  In the event that lookarounds are not working, you may try this version:SELECT    ocr_text,    bucket,    REGEXP_EXTRACT(ocr_text, '(s|^)(?:-?[0-9]{1,3}(,[0-9]{3})*(.[0-9]{2})|-?[0-9]+(.[0-9]{2}))(s|$)')FROM temp;

Advertisement

Answer

Demo