I have a table where each row consist of an attribute which consist of html data with like this.
<div className="single_line"><p>New note example</p></div>
I need to omit the html tags and extract only the data inside the tags using sql query. Any idea on how to achieve this?. I tried out different regex but they didnt work.
Advertisement
Answer
There are 2 solutions based on mysql version.
If you are using MySQL 8.0 then you can use REGEXP_REPLACE() directly inside the select statement.
SELECT REGEXP_REPLACE('<div><p>New note example</p></div>', '(<[^>]*>)|( )', '');
If you are using MySQL 5.7 then you have to create a user define function in database to strip html tags.
DROP FUNCTION IF EXISTS fn_strip_html_tags; CREATE FUNCTION fn_strip_html_tags( html_text TEXT ) RETURNS TEXT BEGIN DECLARE start,end INT DEFAULT 1; DECLARE text_without_nbsp TEXT; LOOP SET start = LOCATE("<", html_text, start); IF (!start) THEN RETURN html_text; END IF; SET end = LOCATE(">", html_text, start); IF (!end) THEN SET end = start; END IF; SET text_without_nbsp = REPLACE(html_text, " ", " "); SET html_text = INSERT(text_without_nbsp, start, end - start + 1, ""); END LOOP; END
For example
SELECT fn_strip_html_tags('<div><p>New note example</p></div>');