Translating pyspark into sql

Question

I&#8217;m experiencing an issue with the following function. I&#8217;m trying to translate this to a SQL statement so I can have a better idea of exactly what&#8217;s happening, so I can more effectively work on my actual issue. I know that this contains a join between valid_data to ri_data, a filter, and a s…

Accepted Answer

You have some substitutions to do, like the column_name for the join keys, etc. But the general structure looks like this in SQL:SELECT DISTINCT A.*,A.etl_row_id AS row_id,A.column_name AS error_valueFROM valid_data ALEFT OUTER JOIN ri_data BON A.column_name = B.ri_columnWHERE B.ri_column IS NULL

Advertisement

Answer