Skip to content
Advertisement

How can I compare rows of data in an array based on distinct attributes of a column?

I have a tricky student work in spark. I need to write an SQL query for this kind of array:

There are more departments and accordingly loans for each department both for males and females. How can I compute a new array where Female’s loans are more than Male’s loans per department and print/show only the departments where female loans are greater than male loans?

Advertisement

Answer

You can group by department, pivot the gender column, and filter the results where female > male. Below is a PySpark solution, but the solution for Scala should be identical except the part for creating the dataframe.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement