Skip to content
Advertisement

snowflake substring by pattern

I have a string column in my table as below:

  accountNumber:123456
 {"accountNumber":"123456"}

i need a dynamic way of only extracting 123456 from these string. Can you plz suggest a solution.

Advertisement

Answer

Use the REGEXP_SUBSTR(…) built-in function to extract substrings using a regular expression pattern.

If there will only be one number in each column value, a number pattern or a numeric characters range syntax will suffice:

SELECT
'accountNumber:123456' i1,
regexp_substr(i1, '[0-9]+') r1,
'{"accountNumber":"123456"}' i2,
regexp_substr(i2, '[0-9]+') r2;

+----------------------+--------+----------------------------+--------+         
| I1                   | R1     | I2                         | R2     |
|----------------------+--------+----------------------------+--------|
| accountNumber:123456 | 123456 | {"accountNumber":"123456"} | 123456 |
+----------------------+--------+----------------------------+--------+

If the number will be exactly 6 digits wide, use the {n} repetition syntax:

select
'accountNumber:123456,anotherNumber:123' i1,
regexp_substr(i1, '[0-9]{6}') r1,
'{"accountNumber":"123456", "anotherNumber": 123}' i2,
regexp_substr(i2,'[0-9]{6}') r2;

+----------------------------------------+--------+--------------------------------------------------+--------+
| I1                                     | R1     | I2                                               | R2     |
|----------------------------------------+--------+--------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123 | 123456 | {"accountNumber":"123456", "anotherNumber": 123} | 123456 |
+----------------------------------------+--------+--------------------------------------------------+--------+

If the number must only follow the text accountNumber, you can introduce (capture groups):

select
'accountNumber:123456,anotherNumber:123,somethingElse:456789' i1,
regexp_substr(i1, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r1,
'{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}' i2,
regexp_substr(i2, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r2;

+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
| I1                                                          | R1     | I2                                                                        | R2     |
|-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123,somethingElse:456789 | 123456 | {"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789} | 123456 |
+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+

Building a fully correct regular expression will require more knowledge about all of the variances possible in the data. Try building out your patterns interactively with a good test-set on sites such as Regex101, RegExr, etc. that make it easier to develop them.

Note: If your data is actually in a JSON format throughout, Snowflake permits parsing them into a VARIANT data type to query them more naturally:

select
parse_json('{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}'):accountNumber::integer account_number;

+----------------+                                                              
| ACCOUNT_NUMBER |
|----------------|
|         123456 |
+----------------+
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement