Converting query from SQL to pyspark

Question

I am trying to convert the following SQL query into pyspark: The code I have in PySpark right now is this: However, this is simply returning the number of rows in the &#8220;data&#8221; dataframe, and I know this isn&#8217;t correct. I am very new at PySpark, can anyone help me solve this? Answer You need to …

Accepted Answer

You need to collect the result into an integer, and then divide the numbers in Python:Result = data.filter(    (coalesce(data["pred"], lit(0)) != 0) &     (coalesce(data["val"], lit(0)) != 0) &     (abs(         coalesce(data["pred"], lit(0)) -          coalesce(data["val"], lit(0))        ) / coalesce(data["val"], lit(0)) > 0.1    )).count() / data.count()

Advertisement

Answer