Pivot on Spark dataframe returns unexpected nulls on only one of several columns

Question

I&#8217;ve pivoted a Spark dataframe, which works correctly for all columns except one, even though they&#8217;re all almost exactly the same. I have a dataframe which looks like this: (there are 29 distinct cf_id values, but in this example only two) when I run: I&#8217;d expect to see: All columns work corr…

Accepted Answer

It looks as though the problem was due to there being true/false/null values in the column.  Somewhere in the pivot function it wasn&#8217;t handling the three values for a seemingly boolean type, and nulling everything.So, (given a table with only boolean cf_id values), when casting the value as boolean it works.val castdf = spark.sql("""select id, cf_id, cast(value as boolean) as value from df""")castdf.groupBy($"id").pivot("cf_id").agg(first($"value")).show+-------+------------+|     id|360019829932|+-------+------------+|3663762|       false||3619941|        null||3667500|       false||3631088|        null||3668712|       false||3661298|        true|I&#8217;m fairly new to spark and SQL, so I couldn&#8217;t explain why.But in conclusion:If you&#8217;re pivoting to a Spark dataframe which will have a string-type column containing true/false/null values, the column the values come from should be cast as boolean.  Thank you @rbcvl for your help

Advertisement

Answer