Skip to content
Advertisement

Multiple Left Joins in Spark Dataframe with same table without Unique Column Error

I have a Main Dataframe – seasonsDF, Left-Joined with Key-Value DF – listvaluesDF, multiple times

val seasonFinal1DF = seasonsDF.join(paletteDF, seasonsDF("Palette") === paletteDF("id"), "left_outer")
                              .join(flextypeDF, seasonsDF("flextype") === listvaluesDF("key"), "left_outer")
                              .join(listvaluesDF, seasonsDF("Year") === listvaluesDF("key"), "left_outer")
                              .join(listvaluesDF, seasonsDF("Set Week") === listvaluesDF("key"), "left_outer")

Now when I try to access Final – seasonFinal1DF , its throwing Column names in each table must be unique. Column name ‘id’ in table ‘SeasonLeft’ is specified more than once.

is there any way I can alias the final columns the same DF being joined multiple times ?

Suggestions please

Advertisement

Answer

you mean “alias” as in adding a suffix to the columns before joining? if yes, you could try going through the columns of the dataframe and renaming them before you join the df

listvaluesDF.columns.foldLeft(listvaluesDF) {
     case(df,col) => df.withColumnRenamed(col, col+"suffix") }

if no, then maybe you could clarify what you’re expecting or give an example of you’re suggesting as a solution!

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement