Skip to content

Tag: apache-spark-sql

Change null to empty array in databricks SQL?

I have a value in a JSON column that is sometimes all null in an Azure Databricks table. The full process to get to JSON_TABLE is: read parquet, infer schema of JSON column, convert the column from JSON string to deeply nested structure, explode any arrays within. I am working in SQL with python-defined UDFs (json_exists() checks the schema to

Converting query from SQL to pyspark

I am trying to convert the following SQL query into pyspark: The code I have in PySpark right now is this: However, this is simply returning the number of rows in the “data” dataframe, and I know this isn’t correct. I am very new at PySpark, can anyone help me solve this? Answer You need to collect the result into