I need to capture the starting characters from database objects with various pattern search.
I have tried using the REGEX expression ,below query i am trying and the expected data set to be.
Query 1
SELECT owner AS schema_name, object_name, object_type, REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '1') as BEGINNING, count(*), round(100*ratio_to_report(count(*)) over (), 4) percentage FROM dba_objects GROUP BY owner, object_name, object_type, REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '1') ORDER BY percentage desc;
Expected Data Set
OBJECT_NAME BEGINNING COUNT(*) PERCENT ABC_CUST_INFO_D ABC 20 .00010 BBC_CUST_ENTRY_F BBC 100 .030 FHS_PRDCT_STST_T A$f 194 .031 GHS_INVTR_CD_DRY A1B 493 .051 #Tableau_6_sid:15251a_4i_2a #Tableau 4000 1.5 /15722c29d_PhotoImageCameraPro /15722c29d 5000 1.6 JAVA/15722c29d_PhotoImageqeeee JAVA 5000 1.6
Unfortunately i was not able to separate the below
#Tableau_6_sid:15251a_4i_2a –> #Tableau
/15722c29d_PhotoImageCameraPro –> /15722c29d
JAVA/15722c29d_PhotoImageqeeee –> JAVA
how to resolve ?
so how to capture the last character after _ , below is an example
Query 2 – capture the last characters
select owner, object_name, object_type, regexp_substr(object_name, '[^_]*$') ENDING, count(*) COUNT, round(100*ratio_to_report(count(*)) over (), 4) percentage from dba_objects where object_name like '%/%' group by owner, object_name, object_type, regexp_substr(object_name, '[^_]*$') ORDER BY percentage desc
Expected Data Set
OBJECT_NAME BEGINNING COUNT(*) PERCENT ABC_CUST_INFO_D D 20 .00010 BBC_CUST_ENTRY_F F 100 .030 FHS_PRDCT_STST_T T 194 .031 GHS_INVTR_CD_DRY DRY 493 .051 Teradata/jdbc/EXAcore/MPPReader MPPReader 5000 1.6 sqlj/modegen/engine/FuncInit FuncInit 8000 6.0 moon/aio/af/ext/ISCII11$Decoder ISCII11$Decoder 8700 6.1
Unfortunately i was not able to separate the below
Teradata/jdbc/EXAcore/MPPReader –> MPPReader
sqlj/modegen/engine/FuncInit –> FuncInit
moon/aio/af/ext/ISCII11$Decoder –> ISCII11$Decoder
Advertisement
Answer
For the regex, you could add matching /
and #
in the character class followed by matching as least as possible chars until the next /
or _
using another character class.
If the pattern is not case insensitive, you could also add a-z
^([A-Za-z0-9$/#]+?)[/_].*
If the /
and #
can only be at the start, you might also optionally match either one of them:
^([/#]?[A-Za-z0-9$]+)[/_].*
See another regex demo