When I run multiple joins on the same table, the first join seems to be the only one to go through.
For example, I’ll get results like this:
ID, NAME, 200, 200 ID, NAME, 150, 150 ID, NAME, 100, 100
Where obviously the ticket count is clearly not the same as the time entry count.
select contact.aid aid, (contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username, count(ticket) tickets, count(time) entries from caches contact inner join caches ticket on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid inner join caches time on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid where contact.name='Contact' group by contact.aid, username order by tickets desc ;
I should be getting results like:
ID, NAME, 200, 421 ID, NAME, 150, 312 ID, NAME, 100, 152
Advertisement
Answer
The principal problem is the same as here:
A little more obscured in your case by nesting values in a jsonb
column, but all the same.
Aggregate first, join later:
SELECT contact.aid , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username , sum(ticket.tickets) AS tickets , sum(ticket.entries) AS entries FROM caches AS contact CROSS JOIN LATERAL ( SELECT count(*)::int AS tickets , sum(entry.entries)::int AS entries FROM caches AS ticket CROSS JOIN LATERAL ( SELECT count(*)::int AS entries FROM caches AS entry WHERE entry.name = 'TimeEntry' AND (entry.data ->> 'TicketID')::numeric = ticket.aid ) AS entry -- was: "time" WHERE ticket.name = 'Ticket' AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric? ) AS ticket WHERE contact.name = 'Contact' GROUP BY contact.aid, username ORDER BY ticket.tickets DESC;
Assuming that aid
, or at least (aid, username)
is unique in the base table, we don’t need the outer aggregate at all:
SELECT contact.aid , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username , ticket.tickets , ticket.entries FROM caches AS contact CROSS JOIN LATERAL ( SELECT count(*)::int AS tickets , sum(entry.entries)::int AS entries FROM caches AS ticket CROSS JOIN LATERAL ( SELECT count(*)::int AS entries FROM caches AS entry WHERE entry.name = 'TimeEntry' AND (entry.data ->> 'TicketID')::numeric = ticket.aid ) AS entry -- was: "time" WHERE ticket.name = 'Ticket' AND (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid -- numeric? ) AS ticket WHERE contact.name = 'Contact' ORDER BY ticket.tickets DESC;
Not only does it avoid the primary error of multiplied counts, it also typically makes the query faster.
Related:
You have INNER JOIN
in your original query, which should probably be LEFT JOIN ... ON true
to avoid eliminating users with no valid entries. It’s safe to converted it to a CROSS JOIN
in my solution because each subquery level is guaranteed to return exactly one row (aggregate functions, and not GROUP BY
). See:
- JOIN (SELECT … ) ue ON 1=1?
- Should I duplicate columns between tables to speed-up aggregations like SUM?
About the LATERAL
join:
Casting to integer (::int
) in the subqueries is optional (and assuming that counts will never be out of integer range). It avoids escalating to numeric
, which is more expensive to sum up.
Why concat_ws()
? See:
And do data ->> 'TicketID'
and data ->> 'CreatorResourceID'
have to be numeric
? Would seem like they should be integer
.
Aside: Normalizing your data model (at least to some extent) would probably help your cause. Joining tables on data values nested in a jsonb
column is comparatively expensive, and can typically be made much more efficient.