Multiple joins on the same table multiply counts

Question

When I run multiple joins on the same table, the first join seems to be the only one to go through. For example, I'll get results like this: ID, NAME, 200, 200 ID, NAME, 150, 150 ID, NAME, 100, 100 ...

Accepted Answer

The principal problem is the same as here:Two SQL LEFT JOINS produce incorrect resultA little more obscured in your case by nesting values in a jsonb column, but all the same.Aggregate first, join later:SELECT contact.aid     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username     , sum(ticket.tickets) AS tickets     , sum(ticket.entries) AS entriesFROM   caches AS contactCROSS  JOIN LATERAL (   SELECT count(*)::int AS tickets        , sum(entry.entries)::int AS entries   FROM   caches AS ticket   CROSS  JOIN LATERAL (      SELECT count(*)::int AS entries      FROM   caches AS entry      WHERE  entry.name = 'TimeEntry'       AND   (entry.data ->> 'TicketID')::numeric = ticket.aid      ) AS entry  -- was: "time"   WHERE  ticket.name = 'Ticket'   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?   ) AS ticketWHERE  contact.name = 'Contact'GROUP  BY contact.aid, usernameORDER  BY ticket.tickets DESC;Assuming that aid, or at least (aid, username) is unique in the base table, we don&#8217;t need the outer aggregate at all:SELECT contact.aid     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username     , ticket.tickets     , ticket.entriesFROM   caches AS contactCROSS  JOIN LATERAL (   SELECT count(*)::int AS tickets        , sum(entry.entries)::int AS entries   FROM   caches AS ticket   CROSS  JOIN LATERAL (      SELECT count(*)::int AS entries      FROM   caches AS entry      WHERE  entry.name = 'TimeEntry'       AND   (entry.data ->> 'TicketID')::numeric = ticket.aid      ) AS entry  -- was: "time"   WHERE  ticket.name = 'Ticket'   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?   ) AS ticketWHERE  contact.name = 'Contact'ORDER  BY ticket.tickets DESC;Not only does it avoid the primary error of multiplied counts, it also typically makes the query faster.Related:Multiple array_agg() calls in a single queryYou have INNER JOIN in your original query, which should probably be LEFT JOIN ... ON true to avoid eliminating users with no valid entries. It&#8217;s safe to converted it to a CROSS JOIN in my solution because each subquery level is guaranteed to return exactly one row (aggregate functions, and not GROUP BY). See:JOIN (SELECT &#8230; ) ue ON 1=1?Should I duplicate columns between tables to speed-up aggregations like SUM?About the LATERAL join:What is the difference between LATERAL and a subquery in PostgreSQL?Casting to integer (::int) in the subqueries is optional (and assuming that counts will never be out of integer range). It avoids escalating to numeric, which is more expensive to sum up.Why concat_ws()? See:How to concatenate columns in a Postgres SELECT?And do data ->> 'TicketID' and data ->> 'CreatorResourceID' have to be numeric? Would seem like they should be integer.Aside: Normalizing your data model (at least to some extent) would probably help your cause. Joining tables on data values nested in a jsonb column is comparatively expensive, and can typically be made much more efficient.

Advertisement

Answer