Skip to content
Advertisement

How to select values between a XML tag in SQL Query

I have a table with CLOB column storing a XML. The structure of XML is unreadable. I want to get values between few tags like <DOMAINID>; sample is shown below.

XML:

<ID>
  <DOMAIN>IND<DOMAIN>
  <DOMAINID>112AC<DOMAINID>
<ID>
  
<GROUP>
  <GP>ASIA<GP>
  <RSN>GOOD<RSN>
<GROUP>

I am using this:

SELECT REGEXP_REPLACE(COL,'^.*<DOMAINID>(.*)</DOMAINID>.*$','1',1,0,'mn') col1 FROM tab;

Expected result:

112AC

Actual XML:

<?xml version="1.0" encoding="US-ASCII"?>
<GML:GMMessage 
   xmlns:GML="GML" 
   xmlns:GMLType="GML.Type" 
   
   xsi:schemaLocation="GML ../schema/gml..xsd" SchemaVersion="9.8">
    <BusinessHdr>
       <busHdr:BusObjectType>ABC</busHdr:BusObjectType>
       <busHdr:BusObjectOwner>HDHDH</busHdr:BusObjectOwner>
       <busHdr:BusObjectId>DJHDAHDAJHDA</busHdr:BusObjectId>
       <busHdr:BusObjectVersion>1</busHdr:BusObjectVersion>
      </BusinessHdr>
    <Transaction>
     <GenericEvent>NEW</GenericEvent>
     <Group>
        <GroupId>3424234</GroupId>
        <Reason>MANUAL</Reason>
    </Group>
    < xsi:type="mm:MMIam">
       <Id>
           <Domain>ssdsgdsg</Domain>
           <DomainId>123456ACC</DomainId>
           <Version>1</Version>
       </Id>
       <Date>2021-02-01</Date>
     </Transaction>
    </GML:GMMessage>

Advertisement

Answer

Do not use regular expressions to parse XML; use a proper XML parser.

However, what you have is not properly formed XML as it is missing a root element and you are missing the / in all of the closing tags; so you first need to fix your XML and give it a root element and then you can parse it using an XML parser.

SELECT x.*
FROM   table_name t
       CROSS APPLY XMLTABLE(
         '//root'
         PASSING XMLTYPE( '<root>' || t.data || '</root>' )
         COLUMNS
           domain   VARCHAR2(10) PATH './ID/DOMAIN',
           domainid VARCHAR2(10) PATH './ID/DOMAINID',
           gp       VARCHAR2(50) PATH './GROUP/GP',
           rsn      VARCHAR2(50) PATH './GROUP/RSN'
       ) x

Which, for the sample data:

CREATE TABLE table_name ( data ) AS
SELECT '<ID>
  <DOMAIN>IND</DOMAIN>
  <DOMAINID>112AC</DOMAINID>
</ID>
<GROUP>
  <GP>ASIA</GP>
  <RSN>GOOD</RSN>
</GROUP>' FROM DUAL

Outputs:

DOMAIN | DOMAINID | GP   | RSN 
:----- | :------- | :--- | :---
IND    | 112AC    | ASIA | GOOD

If you just want a single value then you can use XMLQUERY:

SELECT  XMLQUERY(
          '/root/ID/DOMAINID/text()'
          PASSING XMLTYPE( '<root>'||data||'</root>' )
          RETURNING CONTENT
        ) AS domainid
FROM    table_name

Which outputs:

| DOMAINID |
| :------- |
| 112AC    |

db<>fiddle here


Update

I am going to assume that your XML also defines the xsi and busHdr namespaces (if it doesn’t then Oracle will fail to parse the XML as it does not know what those namespaces are); that would give you this sample data:

CREATE TABLE table_name ( data ) AS
SELECT '<?xml version="1.0" encoding="US-ASCII"?>
<GML:GMMessage 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:GML="GML" 
   xmlns:busHdr="busHdr"
   xmlns:GMLType="GML.Type"
   
   xsi:schemaLocation="GML ../schema/gml.xsd busHdr ../schema/bushdr.xsd"
   SchemaVersion="9.8">
    <BusinessHdr>
       <busHdr:BusObjectType>ABC</busHdr:BusObjectType>
       <busHdr:BusObjectOwner>HDHDH</busHdr:BusObjectOwner>
       <busHdr:BusObjectId>DJHDAHDAJHDA</busHdr:BusObjectId>
       <busHdr:BusObjectVersion>1</busHdr:BusObjectVersion>
      </BusinessHdr>
    <Transaction>
     <GenericEvent>NEW</GenericEvent>
     <Group>
        <GroupId>3424234</GroupId>
        <Reason>MANUAL</Reason>
    </Group>
       <Id>
           <Domain>ssdsgdsg</Domain>
           <DomainId>123456ACC</DomainId>
           <Version>1</Version>
       </Id>
       <Date>2021-02-01</Date>
     </Transaction>
    </GML:GMMessage>' FROM DUAL

Then, you just need to add the namespace that you are using and update the paths to the new (case-sensitive) locations:

SELECT x.*
FROM   table_name t
       CROSS APPLY XMLTABLE(
         XMLNAMESPACES( 'GML' AS "GML" ),
         '//GML:GMMessage/Transaction'
         PASSING XMLTYPE( t.data )
         COLUMNS
           domain   VARCHAR2(10) PATH './Id/Domain',
           domainid VARCHAR2(10) PATH './Id/DomainId',
           version  NUMBER(3,0)  PATH './Id/Version',
           groupid  VARCHAR2(50) PATH './Group/GroupId',
           reason   VARCHAR2(50) PATH './Group/Reason',
           dt       DATE         PATH './Date'
       ) x

Outputs:

DOMAIN   | DOMAINID  | VERSION | GROUPID | REASON | DT       
:------- | :-------- | ------: | :------ | :----- | :--------
ssdsgdsg | 123456ACC |       1 | 3424234 | MANUAL | 01-FEB-21

db<>fiddle here

Advertisement