I have two dataframes.
df1:
Team, Sport, CostTicket Stars, Fotball, 10 Circles, Fotball, 20 Stars, Basket, 12 Stars, Baseball, 14 Circles, Baseball, 25
and
df2:
Team, Sport, CostRange, LowerLimit, UpperLimit Stars, Football, 0<3, 0, 3 Stars, Football, 4<10, 4, 10 Stars, Football, 11<22, 11, 22 Stars, Football, 24<25, 24, 25 Circles, Football, 0<4, 0, 4 Circles, Football, 5<10, 5, 10 Circles, Football, 11<20, 11, 20 Circles, Football, 21<30, 21, 30 Stars, Basket, 0<2, 0, 2 Stars, Basket, 3<7, 3, 7 Stars, Basket, 8<19, 8, 19 Stars, Basket, 20<30, 20, 30 Circles, Basket, 0<1, 0, 1 Circles, Basket, 2<4, 2, 4 Circles, Basket, 5<15, 5, 15 Circles, Basket, 16<30, 16, 30 Stars, Baseball, 0<10, 0, 10 Stars, Baseball, 11<20, 11, 20 Stars, Baseball, 21<30, 21, 30 Circles, Baseball, 0<4, 0, 4 Circles, Baseball, 5<10, 5, 10 Circles, Baseball, 11<20, 11, 20 Circles, Baseball, 21<30, 21, 30
I want to add a fourth column to df1 with the CostRange from df2.
The final result should be:
Team, Sport, CostTicket, Range Stars, Fotball, 10, 5<10 Circles, Fotball, 20, 11<22 Stars, Basket, 12, 8<19 Stars, Baseball, 14, 11<20 Circles, Baseball, 25, 21<30
I have come this far, but it does not work. Is there someone that can help me with this?
val df1 = df2.withColumn("Range", df2("CostRange")) .where(df1("CostTicket") > df2("LowerLimit")) .where(df1("CostTicket") < df2("UpperLimit")) .where(df1("Team") === df2("Team")) .where(df1("Sport") === df2("Sport"))
Advertisement
Answer
You can’t select two columns from two different dataframes, You need to join two dataframes first
You can join with two columns first and use where
as below
df1.join(df2, Seq("Team", "Sport")) .where($"CostTicket" >= $"LowerLimit" && $"CostTicket" <= $"UpperLimit")
Or you could specify in join condition itself as below
df1.join(df2, df1("Team") === df2("Team") && df1("Sport") === df2("Sport") && df1("CostTicket") >= df2("LowerLimit") && df1("CostTicket") <= df2("UpperLimit") ).drop(df2("Team")) .drop(df2("Sport"))
Output:
+-------+--------+----------+---------+----------+----------+ |Team |Sport |CostTicket|CostRange|LowerLimit|UpperLimit| +-------+--------+----------+---------+----------+----------+ |Stars |Football|10 |4<10 |4 |10 | |Circles|Football|20 |11<20 |11 |20 | |Stars |Basket |12 |8<19 |8 |19 | |Stars |Baseball|14 |11<20 |11 |20 | |Circles|Baseball|25 |21<30 |21 |30 | +-------+--------+----------+---------+----------+----------+