I am going through a “clean-up” process of a character column that holds employee rankings that should have a single character: 0-5 or M or U but instead it has up to 3 characters that are either numeric, alpha or alphanumeric.
I spent the better part of two days researching online (Regex, Stack overflow and Oracle resources) and testing and trying various options and the below “RESULT” (refer to code) is what I came up with. It does the job but I can’t help but think that there is a more concise way of doing this. For example, at one point I thought I was close to accomplishing the task with a single instance of REGEXP_SUBSTR which used “|” (refer to PREV_TRY in code below). But then I couldn’t figure out how to take this result and extract from it the first character for cases [1-4]{1}[ABCUL]{1} or the last character for all other cases.
Here is a reproducible example along with the solution I have so far:
WITH T AS ( SELECT 'M' EX FROM DUAL UNION ALL SELECT 'U' FROM DUAL UNION ALL SELECT '1A' FROM DUAL UNION ALL --some two character values are [1-4]{1}[ABCUL]{1} SELECT TO_CHAR(ROWNUM) FROM DUAL CONNECT BY LEVEL <= 7 UNION ALL SELECT '0' FROM DUAL UNION ALL SELECT '113' FROM DUAL UNION ALL--if its numeric it can be 0-999 SELECT '03' FROM DUAL UNION ALL--some two character values are 0[1-4]{1} SELECT '99' FROM DUAL UNION ALL SELECT 'RG1' FROM DUAL UNION ALL--some three character values are RG[1-4]{1} SELECT 'NA' FROM DUAL UNION ALL--some values are 'NA' or 'N/A' SELECT null FROM DUAL --there are null values ) SELECT EX ,NVL(SUBSTR( NVL( SUBSTR(REGEXP_SUBSTR(EX,'(^(0|RG)?[0-5MU]?$)'),-1,1) ,REGEXP_SUBSTR(EX,'^[1-4]{1}[ABCUL]{1}')) ,1,1),0) RESULT --what I came up with so far ,NVL(REGEXP_SUBSTR(EX,'(^(0|RG)?[0-5MU]?$)|(^[1-4]{1}[ABCUL]{1})'),0) PREV_TRY FROM T
I summarize what I need to accomplish with these rules:
- if its a single character then return any character that matches [0-5MU].
- if its a single digit followed by a single alpha character then return the digit [1-4]{1}[ABCUL]{1}. E.g., ‘2A’ returns ‘2’
- if its RG[1-4] then return the digit. E.g., ‘RG3’ returns ‘3’
- if its 0[1-4] then return the second digit. E.g., ’03’ returns ‘3’
- all else return 0
Advertisement
Answer
You can use:
WITH T (ex)AS ( SELECT 'M' FROM DUAL UNION ALL SELECT 'U' FROM DUAL UNION ALL SELECT '1A' FROM DUAL UNION ALL --some two character values are [1-4]{1}[ABCUL]{1} SELECT TO_CHAR(ROWNUM) FROM DUAL CONNECT BY LEVEL <= 7 UNION ALL SELECT '0' FROM DUAL UNION ALL SELECT '113' FROM DUAL UNION ALL--if its numeric it can be 0-999 SELECT '03' FROM DUAL UNION ALL--some two character values are 0[1-4]{1} SELECT '99' FROM DUAL UNION ALL SELECT 'RG1' FROM DUAL UNION ALL--some three character values are RG[1-4]{1} SELECT 'NA' FROM DUAL UNION ALL--some values are 'NA' or 'N/A' SELECT null FROM DUAL --there are null values ) SELECT ex, COALESCE( REGEXP_REPLACE( ex, '^([0-5MU])$|^(d)[A-Z]$|^(RG|0)([1-4])$|^.*$', '124' ), '0' ) AS replacement FROM t
Which outputs:
EX REPLACEMENT M M U U 1A 1 1 1 2 2 3 3 4 4 5 5 6 0 7 0 0 0 113 0 3 3 99 0 RG1 1 NA 0 <null> 0
db<>fiddle here