I have a table of employee similar to this:
Department Data A [{"name":"John", "age":10, "job":"Manager"},{"name":"Eli", "age":40, "job":"Worker"},{"name":"Sam", "age":32, "job":"Manager"}] B [{"name":"Jack", "age":50, "job":"CEO"},{"name":"Mike", "age":334 "job":"CTO"},{"name":"Filip", "age":63, "job":"Worker"}]
I want to get the department, name, and age of all employees, something similar to this:
Department Data A [{"name":"John", "age":10},{"name":"Eli", "age":40},{"name":"Sam", "age":32}] B [{"name":"Jack", "age":50},{"name":"Mike", "age":334},{"name":"Filip", "age":63}]
How can I achieve this using SQL query?
Advertisement
Answer
I assume you are using Hive/Spark
and the datatype of the column is an array of maps.
Using explode
and collect_list
and map
functions.
select dept,collect_list(map("name",t.map_elem['name'],"age",t.map_elem['age'])) as res from tbl lateral view explode(data) t as map_elem group by dept
Note that this would be not be as performant as a Spark solution or a UDF with which you can access the required keys in an array
of map
s, without a function like explode
.
One more way to do this with Spark SQL functions transform
and map_filter
(only available starting Spark version 3.0.0).
spark.sql("select dept,transform(data, map_elem -> map_filter(map_elem, (k, v) -> k != "job")) as res from tbl")
Another option with Spark versions > 2.4 is using function element_at
with transform
and selecting the required keys.
spark.sql("select dept," + "transform(data, map_elem -> map("name",element_at(map_elem,"name"),"age",element_at(map_elem,"age"))) as res " + "from tbl")