In below Hive-query, I need to read the null / empty “string” tags as well, from the XML content. Only the non-null “string” tags are getting considered within the XPATH()
list now.
with your_data as ( select '<ParentArray> <ParentFieldArray> <Name>ABCD</Name> <Value> <string>111</string> <string></string> <string>222</string> </Value> </ParentFieldArray> <ParentFieldArray> <Name>EFGH</Name> <Value> <string/> <string>444</string> <string></string> <string>555</string> </Value> </ParentFieldArray> </ParentArray>' as xmlinfo ) select Name, Value from your_data d lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as Value;
Expected output from query:
Name Value ABCD 111 ABCD ABCD 222 EFGH EFGH 444 EFGH EFGH 555
Advertisement
Answer
The problem here is that XPATH
returns NodeList and if it contains empty node, it is not included in the list.
Concatenation with some string (in XPATH): concat(/Value/string/text()," ")
does not work here:
Caused by: javax.xml.xpath.XPathExpressionException: com.sun.org.apache.xpath.internal.XPathException: Can not convert #STRING to a NodeList!
at com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:195)
Easy solution is to replace <string></string>
and <string/>
with <string>NULL</string>
and then you can convert ‘NULL’ string to null.
Demo:
with your_data as ( select '<ParentArray> <ParentFieldArray> <Name>ABCD</Name> <Value> <string>111</string> <string></string> <string>222</string> </Value> </ParentFieldArray> <ParentFieldArray> <Name>EFGH</Name> <Value> <string/> <string>444</string> <string></string> <string>555</string> </Value> </ParentFieldArray> </ParentArray>' as xmlinfo ) select name, case when value='NULL' then null else value end value from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo from your_data d ) d lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
Result:
name value ABCD 111 ABCD ABCD 222 EFGH EFGH 444 EFGH EFGH 555