Skip to content
Advertisement

Many to many join behaviour

Not really sure how to title this question, but here’s the situation. I have one data frame (dfOrders) that has an order_id and basic information like so:

Then there’s a second data frame (dfStandardized) that contains standardized address information:

And essentially the issue is that one customer placed one order but shipped it to two separate addresses. Because of this, there is only one order_id but two rows, one for each of the shipping addresses. What I would like is a data frame like so:

Where the extra shipping info is only added to the row with the corresponding correct address, but what I’m getting is:

Where both addresses are getting receiving the supplemental attribution of both addresses. This is obviously because it’s a many-to-many join due to only having one order_id. is there some way to get the result I want? Here’s the code I’m using:

Advertisement

Answer

Usually we do cumcount

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement