I have a table with CLOB column storing a XML. The structure of XML is unreadable. I want to get values between few tags like <DOMAINID>
; sample is shown below.
XML:
<ID> <DOMAIN>IND<DOMAIN> <DOMAINID>112AC<DOMAINID> <ID> <GROUP> <GP>ASIA<GP> <RSN>GOOD<RSN> <GROUP>
I am using this:
SELECT REGEXP_REPLACE(COL,'^.*<DOMAINID>(.*)</DOMAINID>.*$','1',1,0,'mn') col1 FROM tab;
Expected result:
112AC
Actual XML:
<?xml version="1.0" encoding="US-ASCII"?> <GML:GMMessage xmlns:GML="GML" xmlns:GMLType="GML.Type" xsi:schemaLocation="GML ../schema/gml..xsd" SchemaVersion="9.8"> <BusinessHdr> <busHdr:BusObjectType>ABC</busHdr:BusObjectType> <busHdr:BusObjectOwner>HDHDH</busHdr:BusObjectOwner> <busHdr:BusObjectId>DJHDAHDAJHDA</busHdr:BusObjectId> <busHdr:BusObjectVersion>1</busHdr:BusObjectVersion> </BusinessHdr> <Transaction> <GenericEvent>NEW</GenericEvent> <Group> <GroupId>3424234</GroupId> <Reason>MANUAL</Reason> </Group> < xsi:type="mm:MMIam"> <Id> <Domain>ssdsgdsg</Domain> <DomainId>123456ACC</DomainId> <Version>1</Version> </Id> <Date>2021-02-01</Date> </Transaction> </GML:GMMessage>
Advertisement
Answer
Do not use regular expressions to parse XML; use a proper XML parser.
However, what you have is not properly formed XML as it is missing a root element and you are missing the /
in all of the closing tags; so you first need to fix your XML and give it a root element and then you can parse it using an XML parser.
SELECT x.* FROM table_name t CROSS APPLY XMLTABLE( '//root' PASSING XMLTYPE( '<root>' || t.data || '</root>' ) COLUMNS domain VARCHAR2(10) PATH './ID/DOMAIN', domainid VARCHAR2(10) PATH './ID/DOMAINID', gp VARCHAR2(50) PATH './GROUP/GP', rsn VARCHAR2(50) PATH './GROUP/RSN' ) x
Which, for the sample data:
CREATE TABLE table_name ( data ) AS SELECT '<ID> <DOMAIN>IND</DOMAIN> <DOMAINID>112AC</DOMAINID> </ID> <GROUP> <GP>ASIA</GP> <RSN>GOOD</RSN> </GROUP>' FROM DUAL
Outputs:
DOMAIN | DOMAINID | GP | RSN :----- | :------- | :--- | :--- IND | 112AC | ASIA | GOOD
If you just want a single value then you can use XMLQUERY
:
SELECT XMLQUERY( '/root/ID/DOMAINID/text()' PASSING XMLTYPE( '<root>'||data||'</root>' ) RETURNING CONTENT ) AS domainid FROM table_name
Which outputs:
| DOMAINID | | :------- | | 112AC |
db<>fiddle here
Update
I am going to assume that your XML also defines the xsi
and busHdr
namespaces (if it doesn’t then Oracle will fail to parse the XML as it does not know what those namespaces are); that would give you this sample data:
CREATE TABLE table_name ( data ) AS SELECT '<?xml version="1.0" encoding="US-ASCII"?> <GML:GMMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:GML="GML" xmlns:busHdr="busHdr" xmlns:GMLType="GML.Type" xsi:schemaLocation="GML ../schema/gml.xsd busHdr ../schema/bushdr.xsd" SchemaVersion="9.8"> <BusinessHdr> <busHdr:BusObjectType>ABC</busHdr:BusObjectType> <busHdr:BusObjectOwner>HDHDH</busHdr:BusObjectOwner> <busHdr:BusObjectId>DJHDAHDAJHDA</busHdr:BusObjectId> <busHdr:BusObjectVersion>1</busHdr:BusObjectVersion> </BusinessHdr> <Transaction> <GenericEvent>NEW</GenericEvent> <Group> <GroupId>3424234</GroupId> <Reason>MANUAL</Reason> </Group> <Id> <Domain>ssdsgdsg</Domain> <DomainId>123456ACC</DomainId> <Version>1</Version> </Id> <Date>2021-02-01</Date> </Transaction> </GML:GMMessage>' FROM DUAL
Then, you just need to add the namespace that you are using and update the paths to the new (case-sensitive) locations:
SELECT x.* FROM table_name t CROSS APPLY XMLTABLE( XMLNAMESPACES( 'GML' AS "GML" ), '//GML:GMMessage/Transaction' PASSING XMLTYPE( t.data ) COLUMNS domain VARCHAR2(10) PATH './Id/Domain', domainid VARCHAR2(10) PATH './Id/DomainId', version NUMBER(3,0) PATH './Id/Version', groupid VARCHAR2(50) PATH './Group/GroupId', reason VARCHAR2(50) PATH './Group/Reason', dt DATE PATH './Date' ) x
Outputs:
DOMAIN | DOMAINID | VERSION | GROUPID | REASON | DT :------- | :-------- | ------: | :------ | :----- | :-------- ssdsgdsg | 123456ACC | 1 | 3424234 | MANUAL | 01-FEB-21
db<>fiddle here