Skip to content
Advertisement

Spark Scala Compare Row and Row of 2 Data frames and get differences

I have a Dataframe 1, Df1, Dataframe 2 , Df2 – Same Schema

I have Row 1 from Df1 – Dfw1, Row 1 from Df2 – Dfw2

I need to compare both to get differences b/n Dfw1 and Dfw2 and get the differences out as collection (Map or something)

Advertisement

Answer

A simple solution would be to transform the Row objects to Map and then compare the values of the 2 Maps.

Something like in Scala:

val m1 = Dfw1.getValuesMap[AnyVal](Dfw1.schema.fieldNames)
val m2 = Dfw2.getValuesMap[AnyVal](Dfw2.schema.fieldNames)

val differences = for {
field <- m1.keySet
if (!m1.get(field).equals(m2.get(field)))
} yield (field, m1(field), m2(field))

Returns Seq of tuples (field, value of Dfw1, value of Dfw1) if they are different.

You may also use pattern matching on Row object to compare:

Dfw1 match {
  case(id: String, desc: String, ....) => // assuming you have the schema
  // compare each value with Dfw2 and return differences
}
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement