Given a dictionary like column in a dataset, I want to grab the value from a key given that the value from another key is satisfied.
Example: Say I have a column ‘statistics’ in a dataset, where each data row looks as:
array 0: {"hair": "black", "eye": "white", "metric": "feet"} 1: {"hair": "blue", "eye": "white", "metric": "m"} 2: {"hair": "red", "eye": "brown", "metric": "feet"} 3: {"hair": "yellow", "eye": "white", "metric": "cm"}
I want to get the value of ‘eye’ whenever hair is ‘black’
I tried:
select statistics.eye("*").filter(statistics.hair, x -> x == 'black') from arrayData
but it gives an error and I’m unable to grab the value for eye, please assist.
Advertisement
Answer
I eventually figured it out without having to first convert to a dataframe.
The aggregate command allows you to grab the value from a key given that the value from another key is satisfied. For this instance, the command below will suffice:
select aggregate(statistics,"",(agg,item)->concat(agg,CASE WHEN item.hair == 'black' THEN item.eye ELSE "" END)) as EyeColor from arrayData
For more details on how to use this function, see here