Skip to content
Advertisement

Spark SQL to join two results from same table

I have a table called “Sold_Items” like below. And I want to use Spark SQL to get the net sell volumes for each participant.

Intermediate table

Final result should look something like below.

I have below two queries for buy and sell side of the first table.

Buy:

Sells:

I am trying to get the intermediate table so I can use that table to get the final result. But I cannot seem to join the two queries. Would appreciate any suggestions on combining the above two queries to get the intermediate table.

Advertisement

Answer

Unpivot and reaggregate. This is simplest with union all:

Note that the aggregation in the subqueries is not really needed, so this will also work:

I would expect the multiple aggregation version to have better performance on large data sets — although the improvement might not be that big.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement