I have a df where I am trying to create the Last Login Date column, as shown in the image.
I am not sure how to get the maximum login date that was on/prior the email notification date for that current row. I added explanations on how I expect the data to look. Any help is appreciated in either sql or pandas.
Advertisement
Answer
Use pandas.merge_asof
:
out pd.merge_asof(df.assign(date=pd.to_datetime(df['email_notification_date']).sort_values()), pd.to_datetime(df['login_date']).dropna().sort_values().rename('last_login_date'), left_on='date', right_on='last_login_date' )
output:
email_notification_date login_date date last_login_date 0 2020-01-01 2020-01-04 2020-01-01 NaT 1 2020-01-02 NaN 2020-01-02 NaT 2 2020-01-03 2020-01-06 2020-01-03 NaT 3 2020-01-04 NaN 2020-01-04 2020-01-04 4 2020-01-05 NaN 2020-01-05 2020-01-04 5 2020-01-06 2020-01-06 2020-01-06 2020-01-06 6 2020-01-07 2020-01-10 2020-01-07 2020-01-06 7 2020-01-18 NaN 2020-01-18 2020-01-10