I have the below dataframe:
PARAM1 | PARAM2 | VALUE |
---|---|---|
A | X | TUE, WED |
A | Y | NO |
B | X | MON, WED |
B | Y | YES |
I would like a pythonic way of obtaining the distinct values of param1 that satisfy EITHER of these conditions:
- Their corresponding param2 = ‘X’ contains the string ‘MON’
- Their corresponding param2 = ‘Y’ is equal to ‘YES’.
In the example above, the output would be just B, because.
PARAM1 | PARAM2 | VALUE | EXPLANATION |
---|---|---|---|
A | X | TUE, WED | X parameter does not contain ‘MON’, so does not count for A. |
A | Y | NO | Y parameter is not equal to ‘YES’, so does not count for A. |
B | X | MON, WED | X parameter contains ‘MON’, so it counts for B. |
B | Y | YES | Y parameter is equal to ‘YES’, so it counts for B. |
Since A has not met either of the criteria for param2 X and Y, it’s not in the output. B has fulfilled both (would have been enough with just one), so it’s in the output.
In Oracle I would do it this way, but not sure how to proceed in python:
SELECT DISTINCT param1 FROM ( -- Fetch the X entries having a 'MON' in value SELECT param1 FROM aux WHERE param2 = 'X' AND REGEXP_LIKE ( value, 'MON' ) UNION ALL -- Fetch the Y entries having value equal to 'YES' SELECT param1 FROM aux WHERE param2 = 'Y' AND param3 = 'YES' );
Advertisement
Answer
First, we form a boolean mask based on the condition, then select the corresponding rows from the dataframe:
cond = ((df['PARAM2'] == 'X') & df['VALUE'].str.contains('MON')) | ((df['PARAM2'] == 'Y') & df['VALUE'].str.contains('YES')) df = df[cond] print(df)
Prints:
PARAM1 PARAM2 VALUE 2 B X MON, WED 3 B Y YES