Skip to content
Advertisement

create rows from columns in a apache spark dataset

I’m trying from a dataset to create a row from existing columns. Here is my case:

InputDataset

accountid payingaccountid billedaccountid startdate enddate
0011t00000MY1U3AAL 0011t00000MY1U3XXX 0011t00000ZZ1U3AAL 2020-06-10 00:00:00.000000 NULL

And I would like to have sometthing like this

accountid startdate enddate
0011t00000MY1U3AAL 2021-06-10 00:00:00.000000 NULL
0011t00000MY1U3XXX 2021-06-10 00:00:00.000000 NULL
0011t00000ZZ1U3AAL 2021-06-10 00:00:00.000000 NULL

In the input dataset the columns billedaccounid and payingaccounid are now also in accountid column.

Thank you in advance.

Advertisement

Answer

You can create an array column with 3 id columns then use explode function to get desired result.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement