Skip to content
Advertisement

Converting query from SQL to pyspark

I am trying to convert the following SQL query into pyspark:

The code I have in PySpark right now is this:

However, this is simply returning the number of rows in the “data” dataframe, and I know this isn’t correct. I am very new at PySpark, can anyone help me solve this?

Advertisement

Answer

You need to collect the result into an integer, and then divide the numbers in Python:

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement