Skip to content
Advertisement

Extract XML sqlQuery Issues in R – Querying Clob Column

I have a oracle DB table called CRS.CRS_FILES, with a column called FILE_DATA -inside that CLOB column is a large XML string.

FILE_DATA   FILE_CREATION_DATE
<?xml version="1.0" encoding="utf-8"?><REPORT   1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT   1/5/2020
<?xml version="1.0" encoding="utf-8"?><REPORT   1/6/2019
<?xml version="1.0" encoding="utf-8"?><REPORT   1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT   1/5/2020

Here is the first few lines of it:

<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>

it is set up with the following Xpath I want to query:

//REPORT/AGENCYIDENTIFIER

query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)

query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)

I am not sure what I am doing – I know that sqlQuery when passing SQL queries there are some minor format issues, but no matter what I try, my result looks like the following:

XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1   NA
2   NA
3   NA
4   NA
5   NA
6   NA
7   NA
8   NA
9   NA
10  NA

What am I doing wrong? I just want to extract the value Milwaukee Police Department (see below) (of course I would rename col to something like AGENCYNAME)

XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1   Milwaukee Police Department
2   Milwaukee Police Department
3   Milwaukee Police Department
4   Milwaukee Police Department
5   Milwaukee Police Department
6   Milwaukee Police Department
7   Milwaukee Police Department
8   Milwaukee Police Department
9   Milwaukee Police Department
10  Milwaukee Police Department

Advertisement

Answer

The EXTRACT(xml) function is deprecated. Instead, use XMLTABLE:

SELECT x.agencyname
FROM   CRS.CRS_FILES c
       CROSS JOIN XMLTABLE(
         XMLNAMESPACES(
           'http://www.w3.org/2001/XMLSchema-instance' AS "i",
           DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
         ),
         '/REPORT'
         PASSING XMLTYPE( c.file_data )
         COLUMNS
           crsreporttimestamp TIMESTAMP     PATH 'CRSREPORTTIMESTAMP',
           agencyidentifier   VARCHAR2(50)  PATH 'AGENCYIDENTIFIER',
           agencyname         VARCHAR2(100) PATH 'AGENCYNAME'
       ) x

or, in R it should be the same with the double quotes escaped:

query_string2 <- "SELECT x.agencyname
FROM   CRS.CRS_FILES c
       CROSS JOIN XMLTABLE(
         XMLNAMESPACES(
           'http://www.w3.org/2001/XMLSchema-instance' AS "i",
           DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
         ),
         '/REPORT'
         PASSING XMLTYPE( c.file_data )
         COLUMNS
           crsreporttimestamp TIMESTAMP     PATH 'CRSREPORTTIMESTAMP',
           agencyidentifier   VARCHAR2(50)  PATH 'AGENCYIDENTIFIER',
           agencyname         VARCHAR2(100) PATH 'AGENCYNAME'
       ) x"

idtable <- sqlQuery(ch,query_string2, max=10)

Which, for your test data:

CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB );

INSERT INTO CRS.crs_files VALUES (
'<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
  <CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
  <AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
  <AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
</REPORT>'
)

Outputs:

| AGENCYNAME                  |
| :-------------------------- |
| Milwaukee Police Department |

If you do want to use EXTRACT then you need to specify the XML namespace:

SELECT XMLTYPE(t.FILE_DATA).EXTRACT(
         '//REPORT/AGENCYNAME/text()',
         'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"'
       ).getStringVal() AS agencyname
FROM   CRS.CRS_FILES t

Outputs:

| AGENCYNAME                  |
| :-------------------------- |
| Milwaukee Police Department |

db<>fiddle here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement