Skip to content
Advertisement

In Hive, how to read through NULL / empty tags present within an XML using explode(XPATH(..)) function?

In below Hive-query, I need to read the null / empty “string” tags as well, from the XML content. Only the non-null “string” tags are getting considered within the XPATH() list now.

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>

        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select Name, Value 
  from your_data d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;

Expected output from query:

Name    Value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555

Advertisement

Answer

The problem here is that XPATH returns NodeList and if it contains empty node, it is not included in the list.

Concatenation with some string (in XPATH): concat(/Value/string/text()," ") does not work here:

Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!

at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)

Easy solution is to replace <string></string> and <string/> with <string>NULL</string> and then you can convert ‘NULL’ string to null.

Demo:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
            <string>222</string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>EFGH</Name>
        <Value>
            <string/>
            <string>444</string>
            <string></string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)

select name, case when value='NULL' then null else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value

Result:

name    value
ABCD    111
ABCD    
ABCD    222
EFGH    
EFGH    444
EFGH    
EFGH    555
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement