Skip to content
Advertisement

Pyspark: cast array with nested struct to string

I have pyspark dataframe with a column named Filters: “array>”

I want to save my dataframe in csv file, for that i need to cast the array to string type.

I tried to cast it: DF.Filters.tostring() and DF.Filters.cast(StringType()), but both solutions generate error message for each row in the columns Filters:

org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@56234c19

The code is as follows

Sample JSON data:

Thanks !!

Advertisement

Answer

I created a sample JSON dataset to match that schema:

Your problem is best solved using the explode() function which flattens an array, then the star expand notation:

To make it a single column string separated by commas:

Explode Array reference: Flattening Rows in Spark

Star expand reference for “struct” type: How to flatten a struct in a spark dataframe?

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement