I have a table with a column named [xml] of type varchar(max) column with HTML Tables in each line, such as:
x
<table>
<tr>
<th type="System.DateTime">Date</th>
<th type="System.String">Name</th>
<th type="System.String">Address</th>
<th type="System.Double">Age</th>
</tr>
<tr>
<td>26/04/2018 00:00:00</td>
<td></td><td>Alice</td>
<td>NYC</td>
<td>10</td>
</tr>
<tr>
<td>26/04/2018 00:00:00</td>
<td></td><td>Bob</td>
<td>LA</td>
<td>21</td>
</tr>
</table>
in a Microsoft SQL Server 2008 R2.
I want to know how many occurrences of a table header there is. I have inconsistencies because all headers were supposed to be the same, but they are not. I have to discover where to look. An expected result would be something like:
HEADER|COUNTER
Name|52
Age|52
Date|50
Address|50
DT|2
City|2
I am trying to use the question (SQL) Identify positions of multiple occurrences of a string format within a field
, with this REGEX: <th[^>]+>[^<]+</th>
. But even when I try select * where [XML] like '%<th[^>]+>[^<]+</th>%'
it doesn’t work. I used Regex101 to create this pattern.
Advertisement
Answer
If interested in a helper function.
Example
Declare @YourTable table (ID int,[XML] varchar(max))
Insert Into @YourTable values
(1,'<table><tr><th type="System.DateTime">Date</th><th type="System.String">Name</th><th type="System.String">Address</th><th type="System.String">Additional Info</th><th type="System.Double">Age</th></tr><tr>...</tr></table>')
,(2,'<table><tr><th type="System.DateTime">Date</th><th type="System.String">Name</th></tr><tr>...</tr></table>')
Select Header = RetVal
,Counter = sum(1)
From @YourTable A
Cross Apply [dbo].[tvf-Str-Extract]([XML],'>','</th')
Group By RetVal
Returns
Header Counter
Additional Info 1
Address 1
Age 1
Date 2
Name 2
The Function If Interested
CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delim1 varchar(100),@Delim2 varchar(100))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by RetSeq)
,RetVal = left(RetVal,charindex(@Delim2,RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (convert(xml,'<x>' + replace((Select replace(@String,@Delim1,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>').query('.'))) as A(XMLData)
Cross Apply XMLData.nodes('x') AS B(i)
) C1
Where charindex(@Delim2,RetVal)>1
)
/*
Max Length of String 1MM characters
Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[tvf-Str-Extract] (@String,'[[',']]')
*/