Spark SQL to join two results from same table

Question

I have a table called &#8220;Sold_Items&#8221; like below. And I want to use Spark SQL to get the net sell volumes for each participant. Item Buyer Seller Qty &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- A &#8230;

Accepted Answer

Unpivot and reaggregate.  This is simplest with union all:select user, sum(buy_qty), sum(sell_qty)from ((select buyer as user, sum(qty) as buy_qty, 0 as sell_qty       from sold_items       group by buyer      ) union all      (select seller as user, 0, sum(qty)       from sold_items       group by seller      )     ) bsgroup by user;Note that the aggregation in the subqueries is not really needed, so this will also work:select user, sum(buy_qty), sum(sell_qty)from ((select buyer as user, qty as buy_qty, 0 as sell_qty       from sold_items      ) union all      (select seller as user, 0, qty       from sold_items      )     ) bsgroup by user;I would expect the multiple aggregation version to have better performance on large data sets &#8212; although the improvement might not be that big.

Advertisement

Answer