I have a oracle DB table called CRS.CRS_FILES, with a column called FILE_DATA -inside that CLOB column is a large XML string.
FILE_DATA FILE_CREATION_DATE
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/6/2019
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
Here is the first few lines of it:
<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
it is set up with the following Xpath I want to query:
//REPORT/AGENCYIDENTIFIER
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
I am not sure what I am doing – I know that sqlQuery when passing SQL queries there are some minor format issues, but no matter what I try, my result looks like the following:
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
What am I doing wrong? I just want to extract the value Milwaukee Police Department (see below) (of course I would rename col to something like AGENCYNAME)
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 Milwaukee Police Department
2 Milwaukee Police Department
3 Milwaukee Police Department
4 Milwaukee Police Department
5 Milwaukee Police Department
6 Milwaukee Police Department
7 Milwaukee Police Department
8 Milwaukee Police Department
9 Milwaukee Police Department
10 Milwaukee Police Department
Advertisement
Answer
The EXTRACT(xml)
function is deprecated. Instead, use XMLTABLE
:
SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS "i",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x
or, in R it should be the same with the double quotes escaped:
query_string2 <- "SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS "i",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x"
idtable <- sqlQuery(ch,query_string2, max=10)
Which, for your test data:
CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB );
INSERT INTO CRS.crs_files VALUES (
'<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
</REPORT>'
)
Outputs:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
If you do want to use EXTRACT
then you need to specify the XML namespace:
SELECT XMLTYPE(t.FILE_DATA).EXTRACT(
'//REPORT/AGENCYNAME/text()',
'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"'
).getStringVal() AS agencyname
FROM CRS.CRS_FILES t
Outputs:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
db<>fiddle here