When I run multiple joins on the same table, the first join seems to be the only one to go through.
For example, I’ll get results like this:
ID, NAME, 200, 200
ID, NAME, 150, 150
ID, NAME, 100, 100
Where obviously the ticket count is clearly not the same as the time entry count.
select
contact.aid aid,
(contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username,
count(ticket) tickets,
count(time) entries
from caches contact
inner join caches ticket
on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid
inner join caches time
on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid
where
contact.name='Contact'
group by
contact.aid,
username
order by
tickets desc
;
I should be getting results like:
ID, NAME, 200, 421
ID, NAME, 150, 312
ID, NAME, 100, 152
Advertisement
Answer
The principal problem is the same as here:
A little more obscured in your case by nesting values in a jsonb
column, but all the same.
Aggregate first, join later:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, sum(ticket.tickets) AS tickets
, sum(ticket.entries) AS entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
GROUP BY contact.aid, username
ORDER BY ticket.tickets DESC;
Assuming that aid
, or at least (aid, username)
is unique in the base table, we don’t need the outer aggregate at all:
SELECT contact.aid
, concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
, ticket.tickets
, ticket.entries
FROM caches AS contact
CROSS JOIN LATERAL (
SELECT count(*)::int AS tickets
, sum(entry.entries)::int AS entries
FROM caches AS ticket
CROSS JOIN LATERAL (
SELECT count(*)::int AS entries
FROM caches AS entry
WHERE entry.name = 'TimeEntry'
AND (entry.data ->> 'TicketID')::numeric = ticket.aid
) AS entry -- was: "time"
WHERE ticket.name = 'Ticket'
AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric?
) AS ticket
WHERE contact.name = 'Contact'
ORDER BY ticket.tickets DESC;
Not only does it avoid the primary error of multiplied counts, it also typically makes the query faster.
Related:
You have INNER JOIN
in your original query, which should probably be LEFT JOIN ... ON true
to avoid eliminating users with no valid entries. It’s safe to converted it to a CROSS JOIN
in my solution because each subquery level is guaranteed to return exactly one row (aggregate functions, and not GROUP BY
). See:
- JOIN (SELECT … ) ue ON 1=1?
- Should I duplicate columns between tables to speed-up aggregations like SUM?
About the LATERAL
join:
Casting to integer (::int
) in the subqueries is optional (and assuming that counts will never be out of integer range). It avoids escalating to numeric
, which is more expensive to sum up.
Why concat_ws()
? See:
And do data ->> 'TicketID'
and data ->> 'CreatorResourceID'
have to be numeric
? Would seem like they should be integer
.
Aside: Normalizing your data model (at least to some extent) would probably help your cause. Joining tables on data values nested in a jsonb
column is comparatively expensive, and can typically be made much more efficient.