Skip to content
Advertisement

Filter dictionary in pyspark with key names

Given a dictionary like column in a dataset, I want to grab the value from a key given that the value from another key is satisfied.

Example: Say I have a column ‘statistics’ in a dataset, where each data row looks as:

I want to get the value of ‘eye’ whenever hair is ‘black’

I tried:

but it gives an error and I’m unable to grab the value for eye, please assist.

Advertisement

Answer

I eventually figured it out without having to first convert to a dataframe.

The aggregate command allows you to grab the value from a key given that the value from another key is satisfied. For this instance, the command below will suffice:

For more details on how to use this function, see here

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement