Skip to content
Advertisement

Multiple joins on the same table multiply counts

When I run multiple joins on the same table, the first join seems to be the only one to go through.

For example, I’ll get results like this:

ID, NAME, 200, 200
ID, NAME, 150, 150
ID, NAME, 100, 100

Where obviously the ticket count is clearly not the same as the time entry count.

select 
    contact.aid aid,
    (contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username,
    count(ticket) tickets,
    count(time) entries
from caches contact
inner join caches ticket
    on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid
inner join caches time
    on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid
where 
    contact.name='Contact'
group by
    contact.aid,
    username
order by 
    tickets desc
;

I should be getting results like:

ID, NAME, 200, 421
ID, NAME, 150, 312
ID, NAME, 100, 152

Advertisement

Answer

The principal problem is the same as here:

A little more obscured in your case by nesting values in a jsonb column, but all the same.

Aggregate first, join later:

SELECT contact.aid
     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
     , sum(ticket.tickets) AS tickets
     , sum(ticket.entries) AS entries
FROM   caches AS contact
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS tickets
        , sum(entry.entries)::int AS entries
   FROM   caches AS ticket
   CROSS  JOIN LATERAL (
      SELECT count(*)::int AS entries
      FROM   caches AS entry
      WHERE  entry.name = 'TimeEntry' 
      AND   (entry.data ->> 'TicketID')::numeric = ticket.aid
      ) AS entry  -- was: "time"
   WHERE  ticket.name = 'Ticket'
   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?
   ) AS ticket
WHERE  contact.name = 'Contact'
GROUP  BY contact.aid, username
ORDER  BY ticket.tickets DESC;

Assuming that aid, or at least (aid, username) is unique in the base table, we don’t need the outer aggregate at all:

SELECT contact.aid
     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
     , ticket.tickets
     , ticket.entries
FROM   caches AS contact
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS tickets
        , sum(entry.entries)::int AS entries
   FROM   caches AS ticket
   CROSS  JOIN LATERAL (
      SELECT count(*)::int AS entries
      FROM   caches AS entry
      WHERE  entry.name = 'TimeEntry' 
      AND   (entry.data ->> 'TicketID')::numeric = ticket.aid
      ) AS entry  -- was: "time"
   WHERE  ticket.name = 'Ticket'
   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?
   ) AS ticket
WHERE  contact.name = 'Contact'
ORDER  BY ticket.tickets DESC;

Not only does it avoid the primary error of multiplied counts, it also typically makes the query faster.

Related:

You have INNER JOIN in your original query, which should probably be LEFT JOIN ... ON true to avoid eliminating users with no valid entries. It’s safe to converted it to a CROSS JOIN in my solution because each subquery level is guaranteed to return exactly one row (aggregate functions, and not GROUP BY). See:

About the LATERAL join:

Casting to integer (::int) in the subqueries is optional (and assuming that counts will never be out of integer range). It avoids escalating to numeric, which is more expensive to sum up.

Why concat_ws()? See:

And do data ->> 'TicketID' and data ->> 'CreatorResourceID' have to be numeric? Would seem like they should be integer.

Aside: Normalizing your data model (at least to some extent) would probably help your cause. Joining tables on data values nested in a jsonb column is comparatively expensive, and can typically be made much more efficient.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement