Skip to content
Advertisement

SQL – Remove all HTML tags in a string

In my dataset, I have a field which stores text marked up with HTML. The general format is as follows:

<html><head></head><body><p>My text.</p></body></html>

I could attempt to solve the problem by doing the following:

However, this is not a strict rule as some of entries break W3C Standards and do not include <head> tags for example. Even worse, there could be missing closing tags. So I would need to include the REPLACE function for each opening and closing tag that could exist.

I was wondering if there was a better way to accomplish this than using multiple nested REPLACE functions. Unfortunately, the only languages I have available in this environment are SQL and Visual Basic (not .NET).

Advertisement

Answer

Update – For strings with unclosed tags:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement