Skip to content
Advertisement

SQL query – What does taking the MIN of this boolean expression mean?

Excuse my ignorance about this… I’m taking a data analysis course and I stumbled upon this query in an exercise:

SELECT
  CASE
    WHEN MIN(REGEXP_CONTAINS(STRING(ActivityDate), DATE_REGEX)) = TRUE THEN
  "Valid"
  ELSE
    "Not Valid"
  END
    AS valid_test
  FROM
   `tracker_data_clean.daily-activity-clean`;

ActivityDate is a field that contains date type data and DATE_REGEX is a regular expression variable for a date format string.

What I don’t know, is what does taking the MIN() of this boolean expression REGEX_CONTAINS do or mean.

I would appreciate if any of you could help me understand the concept of doing this.

Thanks !

Advertisement

Answer

The query selects rows from the table and applies the REGEXP_CONTAINS() function to every (string-converted) value in the ActivityDate column. REGEXP_CONTAINS() will either return true or false based on whether the value matches the regex pattern in DATE_REGEX.

How MIN() behaves here can vary by implementation:

  • Booleans might be coerced as integers, so MIN() is evaluating 0‘s and 1‘s. If all the values are 1 (true), MIN() will be 1 (true), otherwise it will be 0 (false).
  • Other implementations might evaluate booleans directly, so MIN() returns true if all of the values are true, because the minimum value is true (true being “greater” than false), otherwise it returns false.

The result, based on the implementation, is that MIN() returns 0/1, or false/true. Either way, that result is compared to true in the CASE statement. If all values matched the regex, the comparison will be true.

Basically, the query is “does every row have a valid date in the ActivityDate column?” The result will be a table with a single column valid_test and one row, containing “Valid” if they all match, “Not Valid” otherwise.

Another way to look at it that would be relatable to some programming languages is that MIN(bool_function()) is analogous to all(), meaning return true if all values are true. Similarly, MAX(bool_function()) would be analogous to any(), meaning return true if any value is true.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement