I have a dataframe in the format shown below, where we will have multiple entries of DEPNAME as shown below, my requirement is to set the result = Y at the DEPNAME level if either flag_1 or flag_2= Y, if both the flag i.e. flag_1 and flag_2 = N the result will be set as N as shown for DEPNAME=personnel
I am able to get the desired result using joins but I am curious if we can do it using window functions as the dataset is quite huge in size.
+---------+------+------+-+------+ | depName|flag_1|flag_2| result | +---------+------+------+-+------+ | sales| N| Y | Y | | sales| N| N | Y | | sales| N| N | Y | |personnel| N| N | N | |personnel| N| N | N | | develop| Y| N | Y | | develop| N| N | Y | | develop| N| N | Y | | develop| N| N | Y | | develop| N| N | Y | +---------+-----+------+ +------+
Advertisement
Answer
This answers the original version of the question.
This looks like a case expression:
select t.*,
(case when flag_1 = 'Y' or flag_2 = 'Y'
then 'Y' else 'N'
end) as result
For the updated version:
select t.*,
max(case when flag_1 = 'Y' or flag_2 = 'Y'
then 'Y' else 'N'
end) over (partition by depname) as result