I have a string column in my table as below:
accountNumber:123456 {"accountNumber":"123456"}
i need a dynamic way of only extracting 123456 from these string. Can you plz suggest a solution.
Advertisement
Answer
Use the REGEXP_SUBSTR(…)
built-in function to extract substrings using a regular expression pattern.
If there will only be one number in each column value, a number pattern or a numeric characters range syntax will suffice:
SELECT 'accountNumber:123456' i1, regexp_substr(i1, '[0-9]+') r1, '{"accountNumber":"123456"}' i2, regexp_substr(i2, '[0-9]+') r2; +----------------------+--------+----------------------------+--------+ | I1 | R1 | I2 | R2 | |----------------------+--------+----------------------------+--------| | accountNumber:123456 | 123456 | {"accountNumber":"123456"} | 123456 | +----------------------+--------+----------------------------+--------+
If the number will be exactly 6 digits wide, use the {n}
repetition syntax:
select 'accountNumber:123456,anotherNumber:123' i1, regexp_substr(i1, '[0-9]{6}') r1, '{"accountNumber":"123456", "anotherNumber": 123}' i2, regexp_substr(i2,'[0-9]{6}') r2; +----------------------------------------+--------+--------------------------------------------------+--------+ | I1 | R1 | I2 | R2 | |----------------------------------------+--------+--------------------------------------------------+--------| | accountNumber:123456,anotherNumber:123 | 123456 | {"accountNumber":"123456", "anotherNumber": 123} | 123456 | +----------------------------------------+--------+--------------------------------------------------+--------+
If the number must only follow the text accountNumber
, you can introduce (capture groups):
select 'accountNumber:123456,anotherNumber:123,somethingElse:456789' i1, regexp_substr(i1, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r1, '{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}' i2, regexp_substr(i2, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r2; +-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+ | I1 | R1 | I2 | R2 | |-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------| | accountNumber:123456,anotherNumber:123,somethingElse:456789 | 123456 | {"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789} | 123456 | +-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
Building a fully correct regular expression will require more knowledge about all of the variances possible in the data. Try building out your patterns interactively with a good test-set on sites such as Regex101, RegExr, etc. that make it easier to develop them.
Note: If your data is actually in a JSON format throughout, Snowflake permits parsing them into a VARIANT
data type to query them more naturally:
select parse_json('{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}'):accountNumber::integer account_number; +----------------+ | ACCOUNT_NUMBER | |----------------| | 123456 | +----------------+