I have a dataframe in the format shown below, where we will have multiple entries of DEPNAME
as shown below, my requirement is to set the result
= Y at the DEPNAME
level if either flag_1
or flag_2
= Y, if both the flag i.e. flag_1
and flag_2
= N the result will be set as N as shown for DEPNAME
=personnel
I am able to get the desired result using joins but I am curious if we can do it using window functions as the dataset is quite huge in size.
+---------+------+------+-+------+ | depName|flag_1|flag_2| result | +---------+------+------+-+------+ | sales| N| Y | Y | | sales| N| N | Y | | sales| N| N | Y | |personnel| N| N | N | |personnel| N| N | N | | develop| Y| N | Y | | develop| N| N | Y | | develop| N| N | Y | | develop| N| N | Y | | develop| N| N | Y | +---------+-----+------+ +------+
Advertisement
Answer
This answers the original version of the question.
This looks like a case
expression:
select t.*, (case when flag_1 = 'Y' or flag_2 = 'Y' then 'Y' else 'N' end) as result
For the updated version:
select t.*, max(case when flag_1 = 'Y' or flag_2 = 'Y' then 'Y' else 'N' end) over (partition by depname) as result