I have two tables as follows:
TABLE A | id | col_a | col_b | user_id | -------------------------------- | 1 | false | true | 1 | | 2 | false | true | 2 | | 3 | true | true | 2 | | 4 | true | true | 3 | | 5 | true | false | 1 | TABLE B | id | name | -------------- | 1 | Bob | | 2 | Jim | | 3 | Helen | | 4 | Michael| | 5 | Jen |
I want to get the sum of two counts, which are the number of true values in col_a and number of true values in col_b. I want to group that data by user_id. I also want to join Table B and get the name of each user. The result would look like this:
|user_id|total (col_a + col_b)|name ------------------------------------ | 1 | 2 | Bob | 2 | 3 | Jim | 3 | 2 | Helen
So far I got the total sum with the following query:
SELECT (SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+ (SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true) as total
However, I’m not sure how to proceed with grouping these counts by user_id.
Advertisement
Answer
Something like this is typically fastest:
SELECT *
FROM "TABLE_B" b
JOIN (
SELECT user_id AS id
, count(*) FILTER (WHERE col_a)
+ count(*) FILTER (WHERE col_b) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
While fetching all rows, aggregate first, join later. That’s cheaper. See:
The aggregate FILTER clause is typically fastest. See:
- For absolute performance, is SUM faster or COUNT?
- Aggregate columns with additional (distinct) filters
Often, you want to keep total counts of 0 in the result. You did say:
get the name of each user.
SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id
, count(col_a OR NULL)
+ count(col_b OR NULL) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
...
count(col_a OR NULL) is an equivalent alternative, shortest, and still fast. (Use the FILTER clause from above for best performance.)
The LEFT JOIN keeps all rows from "TABLE_B" in the result.
COALESCE() return 0 instead of NULL for the total count.
If col_a and col_b have only few true values, this is typically (much) faster – basically what you had already:
SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total FROM "TABLE_B" b LEFT JOIN ( SELECT user_id AS id, count(*) AS ct FROM "TABLE_A" WHERE col_a GROUP BY 1 ) aa USING (id) LEFT JOIN ( SELECT user_id AS id, count(*) AS ct FROM "TABLE_A" WHERE col_b GROUP BY 1 ) ab USING (id);
Especially with (small in this case!) partial indexes like:
CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a; CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;
Aside: use legal, lower-case unquoted names in Postgres to make your like simpler.