I have 2 tables:
Table class
:
id serial4 PRIMARY KEY name varchar(64) code varchar(64)
Table class_event
, where I store events related to classes, such as “started” and “ended”.
id serial4 class_id int4 NOT NULL // -> FK to the class table event_type varchar(1) NOT NULL // -> 's' for started, 'e' for ended.
I need a query the amount of times each class has been started and ended. This works:
select c.code, c.name, count(started.id) "started" from "class" c left join (select id, class_id, event_type from "class_event" where event_type = 's') started on started.klass_id = c.id group by c.code, c.name order by started desc;
But when I do exactly the same to get the amount of ended classes it shows incorrect amounts:
select c.code, c.name, count(started.id) "started", count(ended.id) "ended" from "class" c left join (select id, class_id, event_type from "class_event" where event_type = 's') started on started.klass_id = c.id left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended on ended.klass_id = c.id group by c.code, c.name order by started desc;
Also, the query takes significantly more time to execute. Is there anything I’m missing?
Advertisement
Answer
Is there anything I’m missing?
Yes, multiple joins multiply rows. It’s exactly the same problem as discussed here:
While you query the whole table it’s typically cleaner and faster to aggregate first and join later. See:
This also avoids the original problem on principle, even for multiple joins – which we don’t need.
SELECT * FROM class c LEFT JOIN ( SELECT class_id AS id , count(*) FILTER (WHERE event_type = 's') AS started , count(*) FILTER (WHERE event_type = 'e') AS ended FROM class_event GROUP BY 1 ) e USING (id) ORDER BY e.started DESC NULLS LAST;
NULLS LAST
because it’s conceivable that some of the classes have no related rows in table class_event
(yet), and the resulting NULL
values surely shouldn’t sort on top. See:
About the aggregate FILTER
clause:
- Aggregate columns with additional (distinct) filters
- For absolute performance, is SUM faster or COUNT?
Aside:
For just a hand full of allowed values, I would consider the data type "char"
instead of varchar(1)
for event_type
. See: