Skip to content
Advertisement

Parse text and special characters from SQL Server

I have an issue with parsing text with special characters from XML in SQL Server.

Let’s say I have a XML file Sample.xml which has the following data:

<People>
    <Person FirstName="Adam"
            LastName="Smith"
            Age="44"
            Weight="178">
        <Texts>
            <Text Country="US"
                  Language="EN"
                  TextType="1">&lt;div&gt;First sentence to retrieve.&lt;/div&gt;</Text>
            <Text Country="GB"
                  Language="EN"
                  TextType="2">&lt;div&gt;Second sentence to retrieve.&lt;/div&gt;</Text>
        </Texts>
    </Person>
</People>

I prepared the following SQL script which can parse everything except two sentences in the <TextType> attribute:

  • First sentence to retrieve
  • Second sentence to retrieve
DECLARE @x XML
SELECT @x = f FROM OPENROWSET(BULK 'C:Sample.xml', single_blob) AS C(f)
DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
SELECT * FROM OPENXML (@hdoc, '/People/Person/Texts/Text')
WITH (
        FirstName varchar(max) '../../@FirstName'
        , LastName varchar(max) '../../@LastName'
        , Age varchar(max) '../../@Age'
        , [Weight] varchar(max) '../../@Weight'
        , Country varchar(max) '@Country'
        , [Language] varchar(max) '@Language'
        , TextType varchar(max) '@TextType'
        )
EXEC sp_xml_removedocument @hdoc

Could you please help me to add the column with the sentences mentioned above?

Advertisement

Answer

OPENXML is old and basically deprecated, it has numerous issues.

You should use the newer XQuery functions .nodes and .value to retrieve your data.

Your primary issue is that you have XML stored as string inside another XML. You need to retrieve it as nvarchar(max), then cast it using TRY_CONVERT.

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM @x.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

db<>fiddle

Note the way there are two calls to .nodes, and one feeds into the next.

You can even feed this in straight from OPENROWSET

SELECT 
    FirstName  = x1.Person.value('@FirstName', 'varchar(100)'),
    LastName   = x1.Person.value('@LastName' , 'varchar(100)'),
    Age        = x1.Person.value('@Age'      , 'int'),
    Weight     = x1.Person.value('@Weight'   , 'decimal(9,5)'),
    Country    = x2.Text.value('@Country' , 'char(2)'),
    [Language] = x2.Text.value('@Language', 'char(2)'),
    TextType   = x2.Text.value('@TextType', 'int'),
    value      = v.InnerXml.value('(div/text())[1]','nvarchar(max)')
FROM OPENROWSET(BULK 'C:Sample.xml', single_blob) AS C(f)
CROSS APPLY (VALUES( TRY_CONVERT(xml, C.f) )) C2(AsXml)
CROSS APPLY C2.AsXml.nodes('People/Person') x1(Person)
CROSS APPLY x1.Person.nodes('Texts/Text') x2(Text)
CROSS APPLY (VALUES( TRY_CONVERT(xml, x2.Text.value('text()[1]','nvarchar(max)')) )) v(InnerXml);

I recommend you fix whatever is generating this XML, ideally you would pass through the inner XML without stringifying it.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement