Skip to content
Advertisement

Moving all rows with a certain index into a single row

I have a table with as structure like the following, with an unknown number of rows with each group index.

Group || PropertyA || PropertyB || PropertyC
============================================
  1   ||   x1      ||    x12    ||    x13
  2   ||   x21     ||    x23    ||    x23
  3   ||   x31     ||    x32    ||    x33
  3   ||   x41     ||    x42    ||    x43
 ...       ...           ...          ...

I want all rows with the same Group index to be in a single row, concatenated, as follows:

Group || PropertyA || PropertyB || PropertyC || PropertyA1 || PropertyB1 || PropertyC1 ...
==================================================================================
  1   ||   x1      ||    x12    ||    x13    ||   NULL    ||   NULL    ||  NULL
  2   ||   x21     ||    x23    ||    x23    ||   NULL    ||   NULL    ||  NULL
  3   ||   x31     ||    x32    ||    x33    ||   x41     ||    x42    ||   x43

 ...       ...           ...          ...

I have attempted this using dynamic SQL, and have also attempted using pandas pandas.pivot() and pandas.pivot_table() but my skills with both have failed me so far. The database I have access to is SQL Server.

This issue is that I’m trying to add an unknown number of columns, as a Group index might appear multiple times.

Anything that even approximates the result could be a great help – the column names are unimportant, and can simply repeat. I just need all the data from each Group in one long row.

Any help is greatly appreciated.

Advertisement

Answer

We can use DataFrame.pivot_table with GroupBy.cumcount and remove the MultiIndex from the columns at the end.

new_df = (df.pivot_table(index='Group', 
                         columns=df.groupby('Group').cumcount(),
                         aggfunc='first')
            .sort_index(axis=1, level=1))
new_df = new_df.set_axis([f'{x}{y}' if y != 0 else x 
                           for x, y in new_df.columns],
                           axis=1).reset_index()
print(new_df)

Output

   Group PropertyA PropertyB PropertyC PropertyA1 PropertyB1 PropertyC1
0      1        x1       x12       x13        NaN        NaN        NaN
1      2       x21       x23       x23        NaN        NaN        NaN
2      3       x31       x32       x33        x41        x42        x43

note that aggfunc = 'first' has simply been used so that it does not fail using np.mean (default aggfunc) with str values. This method will not lose information from your DataFrame when you have used cumcount.

If you want to learn more about pivot you can read How pivot a DataFrame. Your question is similar to Question 10.

I decided not to close this question as a duplicate for two reasons. One is that ìnsert is not necessary and the other is that here you have to change the name of the columns in a somewhat different way.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement