Skip to content
Advertisement

SQL filter elements of array

I have a table of employee similar to this:

Department   Data
A    [{"name":"John", "age":10, "job":"Manager"},{"name":"Eli", "age":40, "job":"Worker"},{"name":"Sam", "age":32, "job":"Manager"}]
B    [{"name":"Jack", "age":50, "job":"CEO"},{"name":"Mike", "age":334 "job":"CTO"},{"name":"Filip", "age":63, "job":"Worker"}]

I want to get the department, name, and age of all employees, something similar to this:

Department   Data
A    [{"name":"John", "age":10},{"name":"Eli", "age":40},{"name":"Sam", "age":32}]
B    [{"name":"Jack", "age":50},{"name":"Mike", "age":334},{"name":"Filip", "age":63}]

How can I achieve this using SQL query?

Advertisement

Answer

I assume you are using Hive/Spark and the datatype of the column is an array of maps.

Using explode and collect_list and map functions.

select dept,collect_list(map("name",t.map_elem['name'],"age",t.map_elem['age'])) as res
from tbl 
lateral view explode(data) t as map_elem 
group by dept

Note that this would be not be as performant as a Spark solution or a UDF with which you can access the required keys in an array of maps, without a function like explode.

One more way to do this with Spark SQL functions transform and map_filter (only available starting Spark version 3.0.0).

spark.sql("select dept,transform(data, map_elem -> map_filter(map_elem, (k, v) -> k != "job")) as res from tbl")

Another option with Spark versions > 2.4 is using function element_at with transform and selecting the required keys.

spark.sql("select dept," +
          "transform(data, map_elem -> map("name",element_at(map_elem,"name"),"age",element_at(map_elem,"age"))) as res " +
          "from tbl")
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement