Skip to content
Advertisement

Splitting a nested dict-like varchar column into multiple columns using SQL presto

In my table I’ve a column, which is a varchar but has a nested dictionary-like format (three nested levels). Some entries have multiple key-value pairs (customer ID & name), while some just have a single entry (customer ID). For example:

column
{
 "customer_type1": {
                    "location1": {"customerid":"12345","name":"John"}, 
                    "location2": {"customerid":"12346","name":"Conor"}, 
                   }
 "customer_type2": {
                    "location3": {"customerid":"12347","name":"Brian"}, 
                    "location4": {"customerid":"12348"}, 
                   }
 }

I need a query that will break out the the column into a table like this:

customer_type   Location   Customer_ID  Name
customer_type1  location1  12345        John
customer_type1  location2  12346        Conor
customer_type2  location3  12347        Brian
customer_type2  location4  12348        

I know a solution to extract for a single nested key-value pair, but cant edit it to work for a nested dictionary like this. I’m using Prestosql.

--query for single nested key-value pair
select json_extract_scalar(json_parse(column), '$.customer_id') customerid,
    json_extract_scalar(json_parse(column), '$.name') name
from dataset

Advertisement

Answer

Common approach with this kind of dynamic json is to cast it to MAP, in this case nested map – MAP(VARCHAR, MAP(VARCHAR, JSON)) and use unnest to flatten the result:

-- sample data
WITH dataset (json_str) AS (
    VALUES (
            '{
 "customer_type1": {
                    "location1": {"customerid":"12345","name":"John"}, 
                    "location2": {"customerid":"12346","name":"Conor"}
                   },
 "customer_type2": {
                    "location3": {"customerid":"12347","name":"Brian"}, 
                    "location4": {"customerid":"12348"} 
                   }
 }'
        )
) 

-- query
select customer_type, 
    location, 
    json_extract_scalar(cust_json, '$.customerid') customer_id, 
    json_extract_scalar(cust_json, '$.name') name
from (
        select cast(json_parse(json_str) as map(varchar, map(varchar, json))) as maps
        from dataset
    )
cross join unnest(maps) as t(customer_type, location_map)
cross join unnest(location_map) as t(location, cust_json)

Output:

customer_type location customer_id name
customer_type1 location1 12345 John
customer_type1 location2 12346 Conor
customer_type2 location3 12347 Brian
customer_type2 location4 12348
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement