Skip to content
Advertisement

Pandas dataframe combine unique row values

I have a dataframe like the following with over 90000 rows.

origin      destination people
101011001   101011001   7378
101011001   101011002   120
101011001   101011002   8
101011001   101011002   285
101011001   101011003   7
101011001   101011004   0
101011001   101011004   1
101011001   101011004   2
101011001   101011004   9
101011002   101011001   5

As you can see, some origin and destination values repeat for example there are multiple rows where origin=101011001, destination=101011002. My goal is to group the repeating origin and destination values and sum the the people column, so the dataframe looks like this:

origin      destination people
101011001   101011001   7378
101011001   101011002   413
101011001   101011003   7
101011001   101011004   12
101011002   101011001   5

I’ve tried jsondf.groupby(['origin', 'destination']).sum() which gives me the correct sum and destination values but it’s not quite what I want as I want the origin values to also be shown in the row for each destination.

Note My end goal is to get this dataframe into a SQL database as a table, and with the .groupby() code above, the origin and destination values are actually interpreted as NULL which is not what I want.

Thanks!

Advertisement

Answer

A quick and easy way to get each of your origin values to display would be to simply reset your index after using the groupby. Here is an example that shows what the database looks like before and after resetting the index:

df.groupby(['origin', 'destination']).sum()

origin      destination  people
101011001   101011001    7378
            101011002    413
            101011003    7
            101011004    12
101011002   101011001    5

Once you add the reset_index(), then the dataframe will have each value of origin represented in every row.

    df.groupby(['origin', 'destination']).sum().reset_index()

    origin      destination people
0   101011001   101011001   7378
1   101011001   101011002   413
2   101011001   101011003   7
3   101011001   101011004   12
4   101011002   101011001   5

This should allow you to send to the sql database without interpreting the origin as null values.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement