I´m using Pandas in Python and I’d like to join 2 dataframes. My first dataframe is:
| id | var | date |
|---|---|---|
| 1 | ABCD | 2019-01-01 |
| 1 | ABCD | 2017-06-01 |
| 1 | ABCD | 2016-06-01 |
| 2 | ABCD | 2016-01-01 |
The dataframe I want to join is:
| var | date |
|---|---|
| ABCD | 2019-01-01 |
| ABCD | 2017-06-01 |
| FJKL | 2016-06-01 |
| ABCD | 2016-01-01 |
I want to join the firt dataframe by var and date but the condition is that the date of the second dataframe has to be smaller than the first. If I were in sql I would do:
select a.*, b.* from fisrt_data a join second_data b on a.var = b.var and a.date > b.date
Any ideas to implement with pandas?
Advertisement
Answer
There is merge and query:
(df1.merge(df2, on=['var'], suffixes=['_a','_b'])
.query('date_a > date_b')
)
Output:
id var date_a date_b 1 1 ABCD 2019-01-01 2017-06-01 2 1 ABCD 2019-01-01 2016-01-01 5 1 ABCD 2017-06-01 2016-01-01 8 1 ABCD 2016-06-01 2016-01-01