I have a table with a JSON string
When running this Spark SQL query:
x
select from_json('[{"column_1":"hola", "some_number":1.0}]', 'array<struct<column_1:string,some_number:int>>')
I get a NULL
, since the data types for some_number
are not matching (int
vs double
)…
If I run this it works:
select from_json('[{"column_1":"hola", "some_number":1.0}]', 'array<struct<column_1:string,some_number:double>>')
Is there a way to CAST this on-the-fly?
Advertisement
Answer
You can do from_json
first using array<struct<column_1:string,some_number:double>>
then cast
as
array<struct<column_1:string,some_number:int>>
Example:
spark.sql("""select cast(from_json('[{"column_1":"hola", "some_number":1.0}]', 'array<struct<column_1:string,some_number:double>>') as array<struct<column_1:string,some_number:int>>)""").show()
//+-------------------------------------------------------+
//|jsontostructs([{"column_1":"hola", "some_number":1.0}])|
//+-------------------------------------------------------+
//| [[hola, 1]]|
//+-------------------------------------------------------+
//printSchema
spark.sql("""select cast(from_json('[{"column_1":"hola", "some_number":1.0}]', 'array<struct<column_1:string,some_number:double>>') as array<struct<column_1:string,some_number:int>>)""").printSchema()
//root
// |-- jsontostructs([{"column_1":"hola", "some_number":1.0}]): array (nullable = true)
// | |-- element: struct (containsNull = true)
// | | |-- column_1: string (nullable = true)
// | | |-- some_number: integer (nullable = true)