I´m using Pandas in Python and I’d like to join 2 dataframes. My first dataframe is:
id | var | date |
---|---|---|
1 | ABCD | 2019-01-01 |
1 | ABCD | 2017-06-01 |
1 | ABCD | 2016-06-01 |
2 | ABCD | 2016-01-01 |
The dataframe I want to join is:
var | date |
---|---|
ABCD | 2019-01-01 |
ABCD | 2017-06-01 |
FJKL | 2016-06-01 |
ABCD | 2016-01-01 |
I want to join the firt dataframe by var
and date
but the condition is that the date of the second dataframe has to be smaller than the first. If I were in sql I would do:
select a.*, b.* from fisrt_data a join second_data b on a.var = b.var and a.date > b.date
Any ideas to implement with pandas?
Advertisement
Answer
There is merge
and query
:
(df1.merge(df2, on=['var'], suffixes=['_a','_b']) .query('date_a > date_b') )
Output:
id var date_a date_b 1 1 ABCD 2019-01-01 2017-06-01 2 1 ABCD 2019-01-01 2016-01-01 5 1 ABCD 2017-06-01 2016-01-01 8 1 ABCD 2016-06-01 2016-01-01