Skip to content
Advertisement

How to query all occurences of a header of a XML column in SQL Server

I have a table with a column named [xml] of type varchar(max) column with HTML Tables in each line, such as:

<table>
<tr>
<th type="System.DateTime">Date</th>
<th type="System.String">Name</th>
<th type="System.String">Address</th>
<th type="System.Double">Age</th>
</tr>
<tr>
<td>26/04/2018 00:00:00</td>
<td></td><td>Alice</td>
<td>NYC</td>
<td>10</td>
</tr>
<tr>
<td>26/04/2018 00:00:00</td>
<td></td><td>Bob</td>
<td>LA</td>
<td>21</td>
</tr>
</table>

in a Microsoft SQL Server 2008 R2.

I want to know how many occurrences of a table header there is. I have inconsistencies because all headers were supposed to be the same, but they are not. I have to discover where to look. An expected result would be something like:

HEADER|COUNTER
Name|52
Age|52
Date|50
Address|50
DT|2
City|2

I am trying to use the question (SQL) Identify positions of multiple occurrences of a string format within a field , with this REGEX: <th[^>]+>[^<]+</th>. But even when I try select * where [XML] like '%<th[^>]+>[^<]+</th>%' it doesn’t work. I used Regex101 to create this pattern.

Advertisement

Answer

If interested in a helper function.

Example

Declare @YourTable table (ID int,[XML] varchar(max))
Insert Into @YourTable values
 (1,'<table><tr><th type="System.DateTime">Date</th><th type="System.String">Name</th><th type="System.String">Address</th><th type="System.String">Additional Info</th><th type="System.Double">Age</th></tr><tr>...</tr></table>')
,(2,'<table><tr><th type="System.DateTime">Date</th><th type="System.String">Name</th></tr><tr>...</tr></table>')

Select Header = RetVal
      ,Counter = sum(1)
 From  @YourTable A
 Cross Apply [dbo].[tvf-Str-Extract]([XML],'>','</th') 
 Group By RetVal

Returns

Header           Counter
Additional Info  1
Address          1
Age              1
Date             2
Name             2

The Function If Interested

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delim1 varchar(100),@Delim2 varchar(100))
Returns Table 
As
Return (  

    Select RetSeq = row_number() over (order by RetSeq)
          ,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
    From  (
            Select RetSeq = row_number() over (order by 1/0)
                  ,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
            From  ( values (convert(xml,'<x>' + replace((Select replace(@String,@Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
            Cross Apply XMLData.nodes('x') AS B(i)
          ) C1
    Where charindex(@Delim2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/
Advertisement