I have a oracle DB table called CRS.CRS_FILES, with a column called FILE_DATA -inside that CLOB column is a large XML string.
FILE_DATA FILE_CREATION_DATE <?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020 <?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020 <?xml version="1.0" encoding="utf-8"?><REPORT 1/6/2019 <?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020 <?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
Here is the first few lines of it:
<?xml version="1.0" encoding="utf-8" ?> <REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"> <CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>- <AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>- <AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
it is set up with the following Xpath I want to query:
//REPORT/AGENCYIDENTIFIER query_string2 <- "SELECT XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal() FROM CRS.CRS_FILES t" idtable <- sqlQuery(ch,query_string2, max=10) query_string2 <- "SELECT XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal() FROM CRS.CRS_FILES t" idtable <- sqlQuery(ch,query_string2, max=10)
I am not sure what I am doing – I know that sqlQuery when passing SQL queries there are some minor format issues, but no matter what I try, my result looks like the following:
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL() 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA 7 NA 8 NA 9 NA 10 NA
What am I doing wrong? I just want to extract the value Milwaukee Police Department (see below) (of course I would rename col to something like AGENCYNAME)
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL() 1 Milwaukee Police Department 2 Milwaukee Police Department 3 Milwaukee Police Department 4 Milwaukee Police Department 5 Milwaukee Police Department 6 Milwaukee Police Department 7 Milwaukee Police Department 8 Milwaukee Police Department 9 Milwaukee Police Department 10 Milwaukee Police Department
Advertisement
Answer
The EXTRACT(xml)
function is deprecated. Instead, use XMLTABLE
:
SELECT x.agencyname FROM CRS.CRS_FILES c CROSS JOIN XMLTABLE( XMLNAMESPACES( 'http://www.w3.org/2001/XMLSchema-instance' AS "i", DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201' ), '/REPORT' PASSING XMLTYPE( c.file_data ) COLUMNS crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP', agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER', agencyname VARCHAR2(100) PATH 'AGENCYNAME' ) x
or, in R it should be the same with the double quotes escaped:
query_string2 <- "SELECT x.agencyname FROM CRS.CRS_FILES c CROSS JOIN XMLTABLE( XMLNAMESPACES( 'http://www.w3.org/2001/XMLSchema-instance' AS "i", DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201' ), '/REPORT' PASSING XMLTYPE( c.file_data ) COLUMNS crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP', agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER', agencyname VARCHAR2(100) PATH 'AGENCYNAME' ) x" idtable <- sqlQuery(ch,query_string2, max=10)
Which, for your test data:
CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB ); INSERT INTO CRS.crs_files VALUES ( '<?xml version="1.0" encoding="utf-8" ?> <REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"> <CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>- <AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>- <AGENCYNAME>Milwaukee Police Department</AGENCYNAME> </REPORT>' )
Outputs:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
If you do want to use EXTRACT
then you need to specify the XML namespace:
SELECT XMLTYPE(t.FILE_DATA).EXTRACT( '//REPORT/AGENCYNAME/text()', 'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"' ).getStringVal() AS agencyname FROM CRS.CRS_FILES t
Outputs:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
db<>fiddle here