I received a few irrelevant results to my count that I would like to remove from my output.
When using only the first “where lower (person) not like ‘%c%'” worked but when adding person b, c etc. as shown it gave me the original irrelevant results.
What should I change?
SELECT person, COUNT(1) AS count_mentions, COUNT(DISTINCT url) AS count_distinct_urls FROM project.dataset.keyword WHERE LOWER(person) NOT LIKE '%b%' OR LOWER(person) NOT LIKE '%c%' OR LOWER(person) NOT LIKE '%e%' OR LOWER(person) NOT LIKE '%f%' GROUP BY person ORDER BY count_mentions DESC LIMIT 5;
Expected results:
Row person count_mentions count_distinct_urls ----------------------------------------------- 1 a --- ---- 2 d 3 g 4 h 5 i
Advertisement
Answer
You want AND
, not OR
:
where lower(person) not like '%b%' and lower(person) not like '%c%' and lower(person) not like '%e%' and lower(person) not like '%f%'
This is logically equivalent to:
where not (lower(person) like '%b%' or lower(person) not like '%c%' or lower(person) not like '%e%' or lower(person) not like '%f%' )
Or more concisely:
where not regexp_contains(person, '[BbCcEeFf]')
Or if you want to be a bit inscrutable:
where not regexp_contains(person, '(?i)[bcef]')
The (?i)
makes the pattern matching case-insensitive.