Skip to content
Advertisement

Sum of two counts from one table with additional data from another table

I have two tables as follows:

TABLE A

| id | col_a | col_b | user_id |
--------------------------------
| 1  | false | true  | 1       |
| 2  | false | true  | 2       |
| 3  | true  | true  | 2       |
| 4  | true  | true  | 3       |
| 5  | true  | false | 1       |

TABLE B

| id | name  |
--------------
| 1  | Bob   |
| 2  | Jim   | 
| 3  | Helen |
| 4  | Michael|
| 5  | Jen   |

I want to get the sum of two counts, which are the number of true values in col_a and number of true values in col_b. I want to group that data by user_id. I also want to join Table B and get the name of each user. The result would look like this:

|user_id|total (col_a + col_b)|name
------------------------------------
| 1     | 2                   | Bob
| 2     | 3                   | Jim
| 3     | 2                   | Helen

So far I got the total sum with the following query:

 SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total

However, I’m not sure how to proceed with grouping these counts by user_id.

Advertisement

Answer

Something like this is typically fastest:

SELECT *
FROM   "TABLE_B" b
JOIN  (
   SELECT user_id AS id
        , count(*) FILTER (WHERE col_a)
        + count(*) FILTER (WHERE col_b) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);

While fetching all rows, aggregate first, join later. That’s cheaper. See:

The aggregate FILTER clause is typically fastest. See:


Often, you want to keep total counts of 0 in the result. You did say:

get the name of each user.

SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id
        , count(col_a OR NULL)
        + count(col_b OR NULL) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);
...

count(col_a OR NULL) is an equivalent alternative, shortest, and still fast. (Use the FILTER clause from above for best performance.)
The LEFT JOIN keeps all rows from "TABLE_B" in the result.
COALESCE() return 0 instead of NULL for the total count.


If col_a and col_b have only few true values, this is typically (much) faster – basically what you had already:

SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_a
   GROUP  BY 1
   ) aa USING (id)
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_b
   GROUP  BY 1
   ) ab USING (id);

Especially with (small in this case!) partial indexes like:

CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;

Aside: use legal, lower-case unquoted names in Postgres to make your like simpler.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement