Skip to content
Advertisement

Translate Oracle query into pandas dataframe handling

I have the below dataframe:

PARAM1 PARAM2 VALUE
A X TUE, WED
A Y NO
B X MON, WED
B Y YES

I would like a pythonic way of obtaining the distinct values of param1 that satisfy EITHER of these conditions:

  1. Their corresponding param2 = ‘X’ contains the string ‘MON’
  2. Their corresponding param2 = ‘Y’ is equal to ‘YES’.

In the example above, the output would be just B, because.

PARAM1 PARAM2 VALUE EXPLANATION
A X TUE, WED X parameter does not contain ‘MON’, so does not count for A.
A Y NO Y parameter is not equal to ‘YES’, so does not count for A.
B X MON, WED X parameter contains ‘MON’, so it counts for B.
B Y YES Y parameter is equal to ‘YES’, so it counts for B.

Since A has not met either of the criteria for param2 X and Y, it’s not in the output. B has fulfilled both (would have been enough with just one), so it’s in the output.

In Oracle I would do it this way, but not sure how to proceed in python:

SELECT DISTINCT
    param1
FROM
    (
        -- Fetch the X entries having a 'MON' in value
        SELECT
            param1
        FROM
            aux
        WHERE
            param2 = 'X'
            AND REGEXP_LIKE ( value,
                              'MON' )
        UNION ALL
        -- Fetch the Y entries having value equal to 'YES'
        SELECT
            param1
        FROM
            aux
        WHERE
            param2 = 'Y'
            AND param3 = 'YES'
    );

Advertisement

Answer

First, we form a boolean mask based on the condition, then select the corresponding rows from the dataframe:

cond = ((df['PARAM2'] == 'X') & df['VALUE'].str.contains('MON')) | 
       ((df['PARAM2'] == 'Y') & df['VALUE'].str.contains('YES'))
df = df[cond]
print(df)

Prints:

  PARAM1 PARAM2     VALUE
2      B      X  MON, WED
3      B      Y       YES
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement