I’m experiencing an issue with the following function. I’m trying to translate this to a SQL statement so I can have a better idea of exactly what’s happening, so I can more effectively work on my actual issue.
I know that this contains a join between valid_data to ri_data, a filter, and a select statement. I’m primarily having issue understanding how to write the join piece.
result = ( valid_data.join( ri_data, F.col(table_name + "." + column_name) == ri_data.ri_column, "left_outer", ) .filter(f"ri_column IS NULL") .selectExpr( "etl_row_id AS row_id", f"{table_name}.{column_name} AS error_value" ) .distinct()
Any help is appreciated.
Advertisement
Answer
You have some substitutions to do, like the column_name for the join keys, etc. But the general structure looks like this in SQL:
SELECT DISTINCT A.*, A.etl_row_id AS row_id, A.column_name AS error_value FROM valid_data A LEFT OUTER JOIN ri_data B ON A.column_name = B.ri_column WHERE B.ri_column IS NULL