Skip to content
Advertisement

Extract a desired character from an alphanumeric column using concise REGEX in Oracle SQL

I am going through a “clean-up” process of a character column that holds employee rankings that should have a single character: 0-5 or M or U but instead it has up to 3 characters that are either numeric, alpha or alphanumeric.

I spent the better part of two days researching online (Regex, Stack overflow and Oracle resources) and testing and trying various options and the below “RESULT” (refer to code) is what I came up with. It does the job but I can’t help but think that there is a more concise way of doing this. For example, at one point I thought I was close to accomplishing the task with a single instance of REGEXP_SUBSTR which used “|” (refer to PREV_TRY in code below). But then I couldn’t figure out how to take this result and extract from it the first character for cases [1-4]{1}[ABCUL]{1} or the last character for all other cases.

Here is a reproducible example along with the solution I have so far:

WITH T AS (
SELECT 'M' EX FROM DUAL UNION ALL
SELECT 'U' FROM DUAL UNION ALL
SELECT '1A'  FROM DUAL UNION ALL --some two character values are [1-4]{1}[ABCUL]{1}
SELECT TO_CHAR(ROWNUM) FROM DUAL CONNECT BY LEVEL  <= 7 UNION ALL
SELECT '0' FROM DUAL UNION ALL
SELECT '113' FROM DUAL UNION ALL--if its numeric it can be 0-999
SELECT '03' FROM DUAL UNION ALL--some two character values are 0[1-4]{1}
SELECT '99' FROM DUAL UNION ALL
SELECT 'RG1' FROM DUAL UNION ALL--some three character values are RG[1-4]{1}
SELECT 'NA' FROM DUAL UNION ALL--some values are 'NA' or 'N/A'
SELECT null FROM DUAL --there are null values
)
SELECT EX
,NVL(SUBSTR(
        NVL(
            SUBSTR(REGEXP_SUBSTR(EX,'(^(0|RG)?[0-5MU]?$)'),-1,1)
            ,REGEXP_SUBSTR(EX,'^[1-4]{1}[ABCUL]{1}'))
    ,1,1),0) RESULT --what I came up with so far
 ,NVL(REGEXP_SUBSTR(EX,'(^(0|RG)?[0-5MU]?$)|(^[1-4]{1}[ABCUL]{1})'),0) PREV_TRY
 FROM T

I summarize what I need to accomplish with these rules:

  • if its a single character then return any character that matches [0-5MU].
  • if its a single digit followed by a single alpha character then return the digit [1-4]{1}[ABCUL]{1}. E.g., ‘2A’ returns ‘2’
  • if its RG[1-4] then return the digit. E.g., ‘RG3’ returns ‘3’
  • if its 0[1-4] then return the second digit. E.g., ’03’ returns ‘3’
  • all else return 0

Advertisement

Answer

You can use:

WITH T (ex)AS (
SELECT 'M' FROM DUAL UNION ALL
SELECT 'U' FROM DUAL UNION ALL
SELECT '1A'  FROM DUAL UNION ALL --some two character values are [1-4]{1}[ABCUL]{1}
SELECT TO_CHAR(ROWNUM) FROM DUAL CONNECT BY LEVEL  <= 7 UNION ALL
SELECT '0' FROM DUAL UNION ALL
SELECT '113' FROM DUAL UNION ALL--if its numeric it can be 0-999
SELECT '03' FROM DUAL UNION ALL--some two character values are 0[1-4]{1}
SELECT '99' FROM DUAL UNION ALL
SELECT 'RG1' FROM DUAL UNION ALL--some three character values are RG[1-4]{1}
SELECT 'NA' FROM DUAL UNION ALL--some values are 'NA' or 'N/A'
SELECT null FROM DUAL --there are null values
)
SELECT ex,
       COALESCE(
         REGEXP_REPLACE(
           ex,
           '^([0-5MU])$|^(d)[A-Z]$|^(RG|0)([1-4])$|^.*$',
           '124'
         ),
         '0'
       ) AS replacement
FROM   t

Which outputs:

EX REPLACEMENT
M M
U U
1A 1
1 1
2 2
3 3
4 4
5 5
6 0
7 0
0 0
113 0
3 3
99 0
RG1 1
NA 0
<null> 0

db<>fiddle here

Advertisement