I have a table with HTML content in which I need to select all rows containing a span element with the CSS class “glyphicon glyphicon-film” with some text content, but exclude all rows without content or just containing a white space.
These rows should NOT match:
<span class="glyphicon glyphicon-film"> </span> <span class="glyphicon glyphicon-film"></span> <span class="glyphicon glyphicon-film"> </span> <span class="hypo-link-target" data-link-target-id="1379">Hammartå</span>
These rows should match:
<span class="glyphicon glyphicon-film">nån text</span> <span class="glyphicon glyphicon-film">Nån Text</span> <span class="glyphicon glyphicon-film">NÅN TEXT</span> <span class="glyphicon glyphicon-film"> nån text</span> <span class="glyphicon glyphicon-film"> Nån Text</span> <span class="glyphicon glyphicon-film"> nån text</span> <span class="glyphicon glyphicon-film"> Nån Text</span> <span class="glyphicon glyphicon-film"> nån text</span> <span class="glyphicon glyphicon-film"> Nån Text</span>
The query I have at the moment:
select * from Section where (Text regexp '(span class="glyphicon glyphicon-film"> .+</span>)' or Text regexp 'span class="glyphicon glyphicon-film">[a-zA-Z][|å|ä|ö|Å|Ä|Ö]+</span>');
Which regexp should I use to get the desired result?
UPDATE: Based on suggestions in the answer I ended up with this query which I think works for all cases. This query will also match on rows with two or more span elements where some elements are correct and some are not, like this one:
<span class="glyphicon glyphicon-film"> Nån Text</span>more content here... and a correct span <span class="glyphicon glyphicon-film"> </span> select * from Section where Text regexp 'span class="glyphicon glyphicon-film"> .+</span>' or (Text regexp 'span class="glyphicon glyphicon-film">[a-zA-ZåäöÅÄÖ ]+</span>' and Text not regexp 'span class="glyphicon glyphicon-film"> </span>');
Advertisement
Answer
The pattern [a-zA-Z][|å|ä|ö|Å|Ä|Ö]
matches a string beginning with an ASCII letter, followed by any mixture of |åäöÅÄÖ
. You want to match a string of any combination of these characters, as well as spaces, which you left out. That would be [a-zA-ZåäöÅÄÖ ]+
.
|
is not used inside []
to specify alternatives, since []
is already specifying alternative characters.
select * from Section where (Text regexp 'span class="glyphicon glyphicon-film"> .+</span>' or Text regexp 'span class="glyphicon glyphicon-film">[a-zA-ZåäöÅÄÖ ]+</span>') and Text not regexp 'span class="glyphicon glyphicon-film"> +</span>';
The last condition is needed to prevent <span class="glyphicon glyphicon-film"> </span>
from matching.