Simpler way to do a SUM with a fanout on a join

Question

Note: SQL backend does not matter, any mainstream relational DB is fine (postgres, mysql, oracle, sqlserver) There is an interesting article on Looker that tells about the technique they use to provide correct totals when a JOIN results in a fanout, along the lines of: A good way to simulate the fanout it just doing something like this: Their example

Accepted Answer

A typical example for the joins mutilating the aggregation is this:select  posts.id,  count(likes.id) as likes_total,  count(dislikes.id) as dislikes_totalfrom postsleft join likes on likes.post_id = posts.post_idleft join dislikes on dislikes.post_id = posts.post_idgroup by posts.id;where both counts result in the same number, because each gets multiplied by the other. With 2 likes and 3 dislikes, both counts are 6.The simple solution is: Aggregate before joining. If you want to know the likes and dislikes counts per post, join the likes and dislikes counts to the posts.select posts.id, l.likes_total, d.dislikes_totalfrom postsleft join(  select post_id, count(*) as likes_total  from likes  group by post_id) l on l.post_id = posts.post_idleft join(  select post_id, count(*) as dislikes_total  from dislikes  group by post_id) d on d.post_id = posts.post_idgroup by posts.id;Use COALESCE, if you want to see zeros instead of nulls.Don&#8217;t try to muddle through with tricks. Just aggregate, then join. You can of course replace the joins with lateral joins (which are correlated subqueries), if the DBMS supports them. Or for single aggregates as in the example even move the correlated subqueries to the select clause. That&#8217;s mainly personal preference, but depending on the DBMS&#8217;s optimizer one solution may be faster than the other. (Ideally the optimizer would come up with the same execution plan for all those queries of course.)

Advertisement

Answer