I have a df where I am trying to create the Last Login Date column, as shown in the image.
I am not sure how to get the maximum login date that was on/prior the email notification date for that current row. I added explanations on how I expect the data to look. Any help is appreciated in either sql or pandas.
Advertisement
Answer
Use pandas.merge_asof
:
x
out pd.merge_asof(df.assign(date=pd.to_datetime(df['email_notification_date']).sort_values()),
pd.to_datetime(df['login_date']).dropna().sort_values().rename('last_login_date'),
left_on='date', right_on='last_login_date'
)
output:
email_notification_date login_date date last_login_date
0 2020-01-01 2020-01-04 2020-01-01 NaT
1 2020-01-02 NaN 2020-01-02 NaT
2 2020-01-03 2020-01-06 2020-01-03 NaT
3 2020-01-04 NaN 2020-01-04 2020-01-04
4 2020-01-05 NaN 2020-01-05 2020-01-04
5 2020-01-06 2020-01-06 2020-01-06 2020-01-06
6 2020-01-07 2020-01-10 2020-01-07 2020-01-06
7 2020-01-18 NaN 2020-01-18 2020-01-10