In below Hive-query, I need to read the null / empty “string” tags as well, from the XML content. Only the non-null “string” tags are getting considered within the XPATH()
list now.
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select Name, Value
from your_data d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;
Expected output from query:
Name Value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555
Advertisement
Answer
The problem here is that XPATH
returns NodeList and if it contains empty node, it is not included in the list.
Concatenation with some string (in XPATH): concat(/Value/string/text()," ")
does not work here:
Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)
Easy solution is to replace <string></string>
and <string/>
with <string>NULL</string>
and then you can convert ‘NULL’ string to null.
Demo:
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
<string>222</string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>EFGH</Name>
<Value>
<string/>
<string>444</string>
<string></string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then null else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
Result:
name value
ABCD 111
ABCD
ABCD 222
EFGH
EFGH 444
EFGH
EFGH 555