Skip to content
Advertisement

How to get the last/maximum date that is on/earlier than another baseline date by user?

I have a df where I am trying to create the Last Login Date column, as shown in the image.

I am not sure how to get the maximum login date that was on/prior the email notification date for that current row. I added explanations on how I expect the data to look. Any help is appreciated in either sql or pandas.

enter image description here

Advertisement

Answer

Use pandas.merge_asof:

out pd.merge_asof(df.assign(date=pd.to_datetime(df['email_notification_date']).sort_values()),
              pd.to_datetime(df['login_date']).dropna().sort_values().rename('last_login_date'),
              left_on='date', right_on='last_login_date'
             )

output:

  email_notification_date  login_date       date last_login_date
0              2020-01-01  2020-01-04 2020-01-01             NaT
1              2020-01-02         NaN 2020-01-02             NaT
2              2020-01-03  2020-01-06 2020-01-03             NaT
3              2020-01-04         NaN 2020-01-04      2020-01-04
4              2020-01-05         NaN 2020-01-05      2020-01-04
5              2020-01-06  2020-01-06 2020-01-06      2020-01-06
6              2020-01-07  2020-01-10 2020-01-07      2020-01-06
7              2020-01-18         NaN 2020-01-18      2020-01-10
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement