Skip to content
Advertisement

PostgreSQL subquery COUNT fails when the subquery is joined more than once

I have 2 tables:

Table class:

id serial4 PRIMARY KEY
name varchar(64)
code varchar(64)

Table class_event, where I store events related to classes, such as “started” and “ended”.

id serial4
class_id int4 NOT NULL  // ->  FK to the class table
event_type varchar(1) NOT NULL  // -> 's' for started, 'e' for ended.

I need a query the amount of times each class has been started and ended. This works:

select
    c.code,
    c.name,
    count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
group by c.code, c.name
order by started desc;

But when I do exactly the same to get the amount of ended classes it shows incorrect amounts:

select
    c.code,
    c.name,
    count(started.id) "started",
    count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started 
    on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended 
    on ended.klass_id = c.id
group by c.code, c.name
order by started desc;

Also, the query takes significantly more time to execute. Is there anything I’m missing?

Advertisement

Answer

Is there anything I’m missing?

Yes, multiple joins multiply rows. It’s exactly the same problem as discussed here:

While you query the whole table it’s typically cleaner and faster to aggregate first and join later. See:

This also avoids the original problem on principle, even for multiple joins – which we don’t need.

SELECT * 
FROM   class c
LEFT   JOIN (
   SELECT class_id AS id
        , count(*) FILTER (WHERE event_type = 's') AS started
        , count(*) FILTER (WHERE event_type = 'e') AS ended
   FROM   class_event
   GROUP  BY 1
   ) e  USING (id)
ORDER  BY e.started DESC NULLS LAST;

NULLS LAST because it’s conceivable that some of the classes have no related rows in table class_event (yet), and the resulting NULL values surely shouldn’t sort on top. See:

About the aggregate FILTER clause:

Aside:

For just a hand full of allowed values, I would consider the data type "char" instead of varchar(1) for event_type. See:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement