Translate Oracle query into pandas dataframe handl…

I have the below dataframe:

PARAM1	PARAM2	VALUE
A	X	TUE, WED
A	Y	NO
B	X	MON, WED
B	Y	YES

I would like a pythonic way of obtaining the distinct values of param1 that satisfy EITHER of these conditions:

Their corresponding param2 = ‘X’ contains the string ‘MON’
Their corresponding param2 = ‘Y’ is equal to ‘YES’.

In the example above, the output would be just B, because.

PARAM1	PARAM2	VALUE	EXPLANATION
A	X	TUE, WED	X parameter does not contain ‘MON’, so does not count for A.
A	Y	NO	Y parameter is not equal to ‘YES’, so does not count for A.
B	X	MON, WED	X parameter contains ‘MON’, so it counts for B.
B	Y	YES	Y parameter is equal to ‘YES’, so it counts for B.

Since A has not met either of the criteria for param2 X and Y, it’s not in the output. B has fulfilled both (would have been enough with just one), so it’s in the output.

In Oracle I would do it this way, but not sure how to proceed in python:

SELECT DISTINCT
    param1
FROM
    (
        -- Fetch the X entries having a 'MON' in value
        SELECT
            param1
        FROM
            aux
        WHERE
            param2 = 'X'
            AND REGEXP_LIKE ( value,
                              'MON' )
        UNION ALL
        -- Fetch the Y entries having value equal to 'YES'
        SELECT
            param1
        FROM
            aux
        WHERE
            param2 = 'Y'
            AND param3 = 'YES'
    );

​x
 
SELECT DISTINCT    param1FROM    (        -- Fetch the X entries having a 'MON' in value        SELECT            param1        FROM            aux        WHERE            param2 = 'X'            AND REGEXP_LIKE ( value,                              'MON' )        UNION ALL        -- Fetch the Y entries having value equal to 'YES'        SELECT            param1        FROM            aux        WHERE            param2 = 'Y'            AND param3 = 'YES'    );​

Answer

First, we form a boolean mask based on the condition, then select the corresponding rows from the dataframe:

cond = ((df['PARAM2'] == 'X') & df['VALUE'].str.contains('MON')) | 
       ((df['PARAM2'] == 'Y') & df['VALUE'].str.contains('YES'))
df = df[cond]
print(df)

 
cond = ((df['PARAM2'] == 'X') & df['VALUE'].str.contains('MON')) |        ((df['PARAM2'] == 'Y') & df['VALUE'].str.contains('YES'))df = df[cond]print(df)​

Prints:

  PARAM1 PARAM2     VALUE
2      B      X  MON, WED
3      B      Y       YES

 
  PARAM1 PARAM2     VALUE2      B      X  MON, WED3      B      Y       YES​

Translate Oracle query into pandas dataframe handling

Advertisement

Answer