Skip to content
Advertisement

capture the starting characters with many patterns

I need to capture the starting characters from database objects with various pattern search.

I have tried using the REGEX expression ,below query i am trying and the expected data set to be.

Query 1

SELECT
   owner AS schema_name, 
   object_name,        
   object_type,
   REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '1') as BEGINNING,
   count(*),
   round(100*ratio_to_report(count(*)) over (), 4) percentage 
FROM
   dba_objects 
GROUP BY
   owner,
   object_name,
   object_type,
   REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_.*', '1') 
ORDER BY
   percentage desc; 

Expected Data Set

OBJECT_NAME                     BEGINNING  COUNT(*) PERCENT
ABC_CUST_INFO_D                 ABC        20      .00010
BBC_CUST_ENTRY_F                BBC        100     .030
FHS_PRDCT_STST_T                A$f        194     .031
GHS_INVTR_CD_DRY                A1B        493     .051
#Tableau_6_sid:15251a_4i_2a     #Tableau   4000    1.5
/15722c29d_PhotoImageCameraPro  /15722c29d 5000    1.6
JAVA/15722c29d_PhotoImageqeeee  JAVA 5000  1.6

Unfortunately i was not able to separate the below

#Tableau_6_sid:15251a_4i_2a –> #Tableau

/15722c29d_PhotoImageCameraPro –> /15722c29d

JAVA/15722c29d_PhotoImageqeeee –> JAVA

how to resolve ?

so how to capture the last character after _ , below is an example

Query 2 – capture the last characters

 select 
   owner, 
   object_name, 
   object_type, 
   regexp_substr(object_name, '[^_]*$') ENDING, 
   count(*) COUNT, 
   round(100*ratio_to_report(count(*)) over (), 4) percentage  
from 
   dba_objects  
   where object_name like '%/%'
group by 
   owner, 
   object_name, 
   object_type, 
   regexp_substr(object_name, '[^_]*$') 
   ORDER BY 
   percentage desc

Expected Data Set

OBJECT_NAME                     BEGINNING       COUNT(*) PERCENT
ABC_CUST_INFO_D                 D               20      .00010
BBC_CUST_ENTRY_F                F               100     .030
FHS_PRDCT_STST_T                T               194     .031
GHS_INVTR_CD_DRY                DRY             493     .051
Teradata/jdbc/EXAcore/MPPReader MPPReader       5000    1.6
sqlj/modegen/engine/FuncInit    FuncInit        8000    6.0
moon/aio/af/ext/ISCII11$Decoder ISCII11$Decoder 8700    6.1

Unfortunately i was not able to separate the below

Teradata/jdbc/EXAcore/MPPReader –> MPPReader

sqlj/modegen/engine/FuncInit –> FuncInit

moon/aio/af/ext/ISCII11$Decoder –> ISCII11$Decoder

Advertisement

Answer

For the regex, you could add matching / and # in the character class followed by matching as least as possible chars until the next / or _ using another character class.

If the pattern is not case insensitive, you could also add a-z

^([A-Za-z0-9$/#]+?)[/_].*

Regex demo

If the / and # can only be at the start, you might also optionally match either one of them:

^([/#]?[A-Za-z0-9$]+)[/_].*

See another regex demo

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement