I received a few irrelevant results to my count that I would like to remove from my output.
When using only the first “where lower (person) not like ‘%c%'” worked but when adding person b, c etc. as shown it gave me the original irrelevant results.
What should I change?
x
SELECT
person,
COUNT(1) AS count_mentions,
COUNT(DISTINCT url) AS count_distinct_urls
FROM
project.dataset.keyword
WHERE
LOWER(person) NOT LIKE '%b%'
OR LOWER(person) NOT LIKE '%c%'
OR LOWER(person) NOT LIKE '%e%'
OR LOWER(person) NOT LIKE '%f%'
GROUP BY
person
ORDER BY
count_mentions DESC
LIMIT
5;
Expected results:
Row person count_mentions count_distinct_urls
-----------------------------------------------
1 a --- ----
2 d
3 g
4 h
5 i
Advertisement
Answer
You want AND
, not OR
:
where lower(person) not like '%b%' and
lower(person) not like '%c%' and
lower(person) not like '%e%' and
lower(person) not like '%f%'
This is logically equivalent to:
where not (lower(person) like '%b%' or
lower(person) not like '%c%' or
lower(person) not like '%e%' or
lower(person) not like '%f%'
)
Or more concisely:
where not regexp_contains(person, '[BbCcEeFf]')
Or if you want to be a bit inscrutable:
where not regexp_contains(person, '(?i)[bcef]')
The (?i)
makes the pattern matching case-insensitive.