I´m using Pandas in Python and I’d like to join 2 dataframes. My first dataframe is:
id | var | date |
---|---|---|
1 | ABCD | 2019-01-01 |
1 | ABCD | 2017-06-01 |
1 | ABCD | 2016-06-01 |
2 | ABCD | 2016-01-01 |
The dataframe I want to join is:
var | date |
---|---|
ABCD | 2019-01-01 |
ABCD | 2017-06-01 |
FJKL | 2016-06-01 |
ABCD | 2016-01-01 |
I want to join the firt dataframe by var
and date
but the condition is that the date of the second dataframe has to be smaller than the first. If I were in sql I would do:
x
select
a.*,
b.*
from fisrt_data a
join second_data b on a.var = b.var and a.date > b.date
Any ideas to implement with pandas?
Advertisement
Answer
There is merge
and query
:
(df1.merge(df2, on=['var'], suffixes=['_a','_b'])
.query('date_a > date_b')
)
Output:
id var date_a date_b
1 1 ABCD 2019-01-01 2017-06-01
2 1 ABCD 2019-01-01 2016-01-01
5 1 ABCD 2017-06-01 2016-01-01
8 1 ABCD 2016-06-01 2016-01-01