I have a table with as structure like the following, with an unknown number of rows with each group index.
Group || PropertyA || PropertyB || PropertyC ============================================ 1 || x1 || x12 || x13 2 || x21 || x23 || x23 3 || x31 || x32 || x33 3 || x41 || x42 || x43 ... ... ... ...
I want all rows with the same Group index to be in a single row, concatenated, as follows:
Group || PropertyA || PropertyB || PropertyC || PropertyA1 || PropertyB1 || PropertyC1 ... ================================================================================== 1 || x1 || x12 || x13 || NULL || NULL || NULL 2 || x21 || x23 || x23 || NULL || NULL || NULL 3 || x31 || x32 || x33 || x41 || x42 || x43 ... ... ... ...
I have attempted this using dynamic SQL, and have also attempted using pandas pandas.pivot() and pandas.pivot_table() but my skills with both have failed me so far. The database I have access to is SQL Server.
This issue is that I’m trying to add an unknown number of columns, as a Group index might appear multiple times.
Anything that even approximates the result could be a great help – the column names are unimportant, and can simply repeat. I just need all the data from each Group in one long row.
Any help is greatly appreciated.
Advertisement
Answer
We can use DataFrame.pivot_table
with GroupBy.cumcount
and remove the MultiIndex from the columns at the end.
new_df = (df.pivot_table(index='Group', columns=df.groupby('Group').cumcount(), aggfunc='first') .sort_index(axis=1, level=1)) new_df = new_df.set_axis([f'{x}{y}' if y != 0 else x for x, y in new_df.columns], axis=1).reset_index() print(new_df)
Output
Group PropertyA PropertyB PropertyC PropertyA1 PropertyB1 PropertyC1 0 1 x1 x12 x13 NaN NaN NaN 1 2 x21 x23 x23 NaN NaN NaN 2 3 x31 x32 x33 x41 x42 x43
note that aggfunc = 'first'
has simply been used so that it does not fail using np.mean (default aggfunc) with str values. This method will not lose information from your DataFrame when you have used cumcount.
If you want to learn more about pivot
you can read How pivot a DataFrame
. Your question is similar to Question 10.
I decided not to close this question as a duplicate for two reasons. One is that ìnsert
is not necessary and the other is that here you have to change the name of the columns in a somewhat different way.