Skip to content
Advertisement

Processing mulitple similar rows in Pandas

I have a dataframe pulled from a relational database. A one-to-many join has resulted in many similar rows with one column different. I would like to combine the similar rows but have the differing column data contained within a list, for each unique row. I am also able to change the SQL but I think this may be easier to do downstream within Pandas.

Here is a reproducable example:

This outputs:

name title
10001 Georgi Facello Senior Engineer
10002 Bezalel Simmel Staff
10003 Parto Bamford Senior Engineer
10004 Chirstian Koblick Senior Engineer
10004 Chirstian Koblick Engineer
10005 Kyoichi Maliniak Staff
10005 Kyoichi Maliniak Senior Staff
10006 Anneke Preusig Senior Engineer
10007 Tzvetan Zielinski Staff
10007 Tzvetan Zielinski Senior Staff

I would like to be able to almagamate the similar rows, creating a list with all the different values for a particular record:

name titles
10001 Georgi Facello [Senior Engineer]
10002 Bezalel Simmel [Staff]
10003 Parto Bamford [Senior Engineer]
10004 Chirstian Koblick [Engineer, Senior Engineer]
10005 Kyoichi Maliniak [Senior Staff, Staff]
10006 Anneke Preusig [Senior Engineer]
10007 Tzvetan Zielinski [Senior Staff, Staff]

What is the best (most readable, easiest to understand and maintain) way to achieve this, either in SQL or in Pandas? (I suspect this will be a lot easier in pandas but I can adjust the SQL to help)

Advertisement

Answer

Output

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement