I’m trying to figure out the right SQL query to group data but also include a boolean (YES/NO) if at least one of the rows grouped has a certain value.
Here’s a concrete example to help visualize:
Let’s say I have some data like this, maybe in a table called “experiments” with fields like this:
id, experiment_name, user_id, created_at, completed_at
And let’s say I have this data:
1, "test experiment", 23, 2019-09-10, NULL 2, "test experiment", 17, 2019-09-15, NULL 3, "test experiment", 23, 2019-09-18, 2019-09-19 4, "test experiment", 23, 2019-09-19, NULL
I want to write a SQL query that does a group_by the user_id so I can have a count, but also includes some boolean value of whether at least one of their rows included a completed_at value.
Thus, I can’t just do:
SELECT user_id, count(*), completed_at from db group by user_id, completed_at;
That ends up like:
user_id count(*) completed_at 23 2 NULL 23 1 2019-09-19 17 1 NULL
What I’d like is some query that lets me end up with something like:
user_id count(*) did_complete? 23 3 TRUE 17 1 FALSE
But I’m stumped on that part. I tried searching SO here but I can’t figure out how to even word this as a short, searchable question.
Advertisement
Answer
You use an aggregate function with a conditional expression to check if at least one record in the group has a non-null completed_at
value:
SELECT user_id, COUNT(*) cnt, MAX(CASE WHEN completed_at IS NOT NULL THEN 1 ELSE 0 END) did_complete FROM experiments GROUP BY user_id
I usually prefer using 0
/1
rather than TRUE/FALSE
since numbers are much more portable across different RDBMS than booleans.