Skip to content
Advertisement

How to execute custom logic at pyspark window partition

I have a dataframe in the format shown below, where we will have multiple entries of DEPNAME as shown below, my requirement is to set the result = Y at the DEPNAME level if either flag_1 or flag_2= Y, if both the flag i.e. flag_1 and flag_2 = N the result will be set as N as shown for DEPNAME=personnel

I am able to get the desired result using joins but I am curious if we can do it using window functions as the dataset is quite huge in size.

Advertisement

Answer

This answers the original version of the question.

This looks like a case expression:

For the updated version:

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement