Is there a way of using SQL or Python (or another language) to return the node-tree for an XML document?
In the XML document example below, the employee node appears twice but the second time it has more nodes within it… same with the address node. Is there a way to return the structure (without values contained within the tags) so that you can see all nodes that exist within another node? Sort of like creating a pivot table of all the field names in an excel document?
<dept> <dept_id>1</dept_id> <dept_name>Marketing</dept_name> <employee> <emp_id>1</emp_id> <emp_fname>Chandler</emp_fname> <address> <street_1>12345 Central Park Ave.</street_1> <street_2></street_2> <city>Hazel Crest</city> <state>IL</state> <zipcode>60429</zipcode> </address> </employee> <employee> <emp_id>2</emp_id> <emp_fname>Joey</emp_fname> <emp_lname>Tribbiani</emp_lname> <emp_title>Manager</emp_title> <address> <street_1>2312 28th Rd. S.</street_1> <street_2></street_2> <city>Arlington</city> <state>VA</state> <zipcode>22206</zipcode> <start_date>2000-03-01</start_date> <end_date>2012-12-31</end_date> </address> </employee> </dept>
A useful output would be something like this, but could come in whatever form is easiest to generate:
<dept> <dept_id> <dept_name> <employee> <emp_id> <emp_fname> <emp_lname> <emp_title> <address> <street_1> <street_2> <city> <state> <zipcode> <start_date> <end_date>
Thank you!
Advertisement
Answer
select replace(trim(regexp_replace(f.path, '['\[\]@]*' ,''), '$'), '$', ',') as _order ,'<'|| f.value::text || '>' as name from data, table(flatten(input=>xml, recursive=>true)) f where f.key = '@' order by _order;
gives:
_ORDER | NAME |
---|---|
null | <dept> |
0 | <dept_id> |
1 | <dept_name> |
2 | <employee> |
2,0 | <emp_id> |
2,1 | <emp_fname> |
2,2 | <address> |
2,2,0 | <street_1> |
2,2,1 | <street_2> |
2,2,2 | <city> |
2,2,3 | <state> |
2,2,4 | <zipcode> |
3 | <employee> |
3,0 | <emp_id> |
3,1 | <emp_fname> |
3,2 | <emp_lname> |
3,3 | <emp_title> |
3,4 | <address> |
3,4,0 | <street_1> |
3,4,1 | <street_2> |
3,4,2 | <city> |
3,4,3 | <state> |
3,4,4 | <zipcode> |
3,4,5 | <start_date> |
3,4,6 | <end_date> |
and
select REPEAT(' ', regexp_count(_order, '($[0-9]+)*')-1) || name as tree from ( select regexp_replace(f.path, '['\[\]@]*' ,'') as _order ,'<'|| f.value::text || '>' as name from data, table(flatten(input=>xml, recursive=>true)) f where f.key = '@' ) order by _order ;
gives:
TREE |
---|
<dept> |
<dept_id> |
<dept_name> |
<employee> |
<emp_id> |
<emp_fname> |
<address> |
<street_1> |
<street_2> |
<city> |
<state> |
<zipcode> |
<employee> |
<emp_id> |
<emp_fname> |
<emp_lname> |
<emp_title> |
<address> |
<street_1> |
<street_2> |
<city> |
<state> |
<zipcode> |
<start_date> |
<end_date> |