How to aggregate on multiple columns using SQL or spark SQL

Question

I have following table: Expected output is: The aggregation computation involves 2 columns, is this supported in SQL? Answer In Spark SQL you can do it like this: or in one select: Higher-order aggregate function is used in this example. aggregate(expr, start, merge, finish) &#8211; Applies a binary operator …

Accepted Answer

In Spark SQL you can do it like this:SELECT Id, aggregate(list, '', (acc, x) -> concat(acc, x)) col3FROM (SELECT Id, array_sort(collect_list(concat(col1, col2))) list      FROM df      GROUP BY Id )or in one select:SELECT Id, aggregate(array_sort(collect_list(concat(col1, col2))), '', (acc, x) -> concat(acc, x)) col3FROM dfGROUP BY IdHigher-order aggregate function is used in this example.aggregate(expr, start, merge, finish) &#8211; Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

Advertisement

Answer