Skip to content
Advertisement

Find numeric record number in string

I’ve been using this site for longer than I can remember, but I’ve never had to ask a question before – I’ve always been able to find what I need – but this one could use some SQL expertise for the best approach since it’s kind of a worst case.

I have a text comment field that gets a lot of stuff put in it in no consistent format – it’ll include a patient name, hopefully a record ID (which could come from 2 different tables differentiated by the range), spaces, slashes, phone numbers, the initials of whoever entered it, etc. I’ll scrub some examples… I need to find the numeric RecordID in here (it references another disconnected system) and ignore the rest. There’s a remote possibility it could have 2 record numbers (could be referenced in two different tables) – if so, I need to pick the more important one.

Example Data (scrubbed):
Sanchez, Raul – POS – OS – 489-849-7894 – AB – Conf >> Nothing of value here/return NULL
Smith, Nancy RL 1458968-rl >> Return 1458968
Sudhu, Mandeep intake # 78596/rl >> Return 78596
1887569 Smith, Jonathan-ESK >> Return 1021015
Montana, Joe/1451252rm >> Return 1451252
Johnson, Fredy/1784569/78563/RG >> Return 1784569

My first thought was to do a bunch of nested Replace() functions to turn anything that’s not a character into a standard delimiter, parse these all into separate records with split_string(), then only pick the largest number – and that would work in every instance except #5 above… but is that the best way, or is there something better? SQL Version is 2017 for reference to available functions.

This will be wrapped into a scalar valued function.

Thanks!

Advertisement

Answer

Here is a way to extract all numbers.

t1 identify all non-digit characters in each text.
t2 replaces all non-digit characters with spaces.
t3 assumes that numbers might be separated by an unknown number of spaces, and replaces every sequence of spaces with a single space.

Enjoy 🙂

with
t1 
as 
(
    select  record_id
           ,txt
           ,trim(translate(txt, '1234567890', space(10))) as non_digits 
    from    t
),
t2 
as 
(  
    select  record_id
           ,trim(translate(txt, non_digits, space(len(non_digits)))) as spaced_numberes 
    from    t1
),
t3 
as 
(
    select  record_id
           ,replace(replace(replace(spaced_numberes, ' ', ' *'), '* ', ''), '*', '') as space_delimited_numbers 
           
    from    t2
)
select  t3.record_id
       ,ss.value  
       
from    t3 
        cross apply string_split(space_delimited_numbers, ' ') as ss
record_id value
1 489
1 849
1 7894
2 1458968
3 78596
4 1887569
5 1451252
6 1784569
6 78563

Fiddle

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement