Skip to content
Advertisement

XML load is slow. Using text() when reading XML file with attribute-centric mapping

I am reading in an XML file into the database. It works fine however it is very slow. I was wondering if I can use text() in order to speed the process up similarly to the solution in this post: Xml select query xpath is slow.
My problem however is that the solution is for XML with element-centric mapping while I have attribute-centric mapping so I’m not sure how to use text() in my case.

The XML looks like this

<Level1>
    <Level2 name="l2_name">
        <Level3 name="l3_name">
            <Level4 name="l4_name">
                <Level5 name="l5_name">
                    <Level6_1 name="l6_1_name" value="l6_1_value"/>
                    <Level6_2 name="l6_2_name" value="l6_2_value"/>

I am using the following code

declare @x xml

select @x = c1 
from openrowset(bulk 'd:file.xml', single_blob) as table1(c1);

select
    c.value('@name', 'varchar(20)') as [col1],
    c2.value('@name', 'varchar(20)') as [col2],
    c3.value('@name', 'varchar(20)') as [col3],
    c4.value('@name', 'varchar(20)') as [col4],
    c5_1.value('@name', 'varchar(20)') as [col5],
    c5_1.value('@value', 'varchar(20)') as [col6],
    c5_2.value('@name', 'varchar(20)') as [col7],
    c5_2.value('@value', 'varchar(20)') as [col8],
from @x.nodes('Level1/Level2') as t(c)
cross apply c.nodes('Level3') as t2(c2)
cross apply c2.nodes('Level4') as t3(c3)
cross apply c3.nodes('Level5') as t4(c4)
cross apply c4.nodes('Level6_1') as t5_1(c5_1)
cross apply c4.nodes('Level6_2') as t5_2(c5_2)

The file I am reading is 13MB and it takes 3.5 hours to read it in.

Advertisement

Answer

Without being able to test on so many rows as you have, I’ll give you this script to try out (fiddle<>):

DECLARE @x XML=N'
<Level1><Level2 name="l2_name"><Level3 name="l3_name">
<Level4 name="l4_name"><Level5 name="l5_name">
<Level6 name="l6_name" value="l6_value"/>
<Level6 name="l6_name" value="l6_value"/>
</Level5></Level4></Level3></Level2></Level1>';

SELECT
    x.n.value('../../../../@name', 'varchar(20)') as [col1],
    x.n.value('../../../@name', 'varchar(20)') as [col2],
    x.n.value('../../@name', 'varchar(20)') as [col3],
    x.n.value('../@name', 'varchar(20)') as [col4],
    x.n.value('./@name', 'varchar(20)') as [col5]
FROM
    @x.nodes('/Level1/Level2/Level3/Level4/Level5/Level6') AS x(n);

This would explicitly query for Level6 nodes, then backtrack for the parents’ attribute values. This is quite likely faster than cross applying queries for each individual LevelN element.


Updated, for elements in level 6 with different names and assuming only one such element of each name appears as a child of level 5:

DECLARE @x_2 XML=N'<Level1>
    <Level2 name="l2_name">
        <Level3 name="l3_name">
            <Level4 name="l4_name">
                <Level5 name="l5_name">
                    <Level6_1 name="l6_1_name" value="l6_1_value"/>
                    <Level6_2 name="l6_2_name" value="l6_2_value"/>
                    </Level5></Level4></Level3></Level2></Level1>';

SELECT
    x.n.value('../../../@name', 'varchar(20)') as [col1],
    x.n.value('../../@name', 'varchar(20)') as [col2],
    x.n.value('../@name', 'varchar(20)') as [col3],
    x.n.value('./@name', 'varchar(20)') as [col4],
    x.n.value('(./Level6_1/@name)[1]', 'varchar(20)') as [col5],
    x.n.value('(./Level6_2/@name)[1]', 'varchar(20)') as [col6]
FROM
    @x_2.nodes('/Level1/Level2/Level3/Level4/Level5') AS x(n);

This selects nodes on level 5, backtracks to parents for parent attributes, then selects child elements based on name. Selects the first such element using [1] selector.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement