Pyspark: cast array with nested struct to string

Question

I have pyspark dataframe with a column named Filters: &#8220;array>&#8221; I want to save my dataframe in csv file, for that i need to cast the array to string type. I tried to cast it: DF.Filters.tostring() and DF.Filters.cast(StringType()), but both solutions generate error message for each row in the co…

Accepted Answer

I created a sample JSON dataset to match that schema:{"ClientNum":"abc123","Filters":[{"Op":"foo","Type":"bar","Val":"baz"}]}select(s.col("ClientNum"),s.col("Filters").cast(StringType)).show(false)+---------+------------------------------------------------------------------+|ClientNum|Filters                                                           |+---------+------------------------------------------------------------------+|abc123   |org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@60fca57e|+---------+------------------------------------------------------------------+Your problem is best solved using the explode() function which flattens an array, then the star expand notation:s.selectExpr("explode(Filters) AS structCol").selectExpr("structCol.*").show()+---+----+---+| Op|Type|Val|+---+----+---+|foo| bar|baz|+---+----+---+To make it a single column string separated by commas:s.selectExpr("explode(Filters) AS structCol").select(F.expr("concat_ws(',', structCol.*)").alias("single_col")).show()+-----------+| single_col|+-----------+|foo,bar,baz|+-----------+Explode Array reference: Flattening Rows in SparkStar expand reference for &#8220;struct&#8221; type: How to flatten a struct in a spark dataframe?

Advertisement

Answer