I have a Dataframe 1, Df1, Dataframe 2 , Df2 – Same Schema
I have Row 1 from Df1 – Dfw1, Row 1 from Df2 – Dfw2
I need to compare both to get differences b/n Dfw1 and Dfw2 and get the differences out as collection (Map or something)
Advertisement
Answer
A simple solution would be to transform the Row objects to Map and then compare the values of the 2 Maps.
Something like in Scala:
x
val m1 = Dfw1.getValuesMap[AnyVal](Dfw1.schema.fieldNames)
val m2 = Dfw2.getValuesMap[AnyVal](Dfw2.schema.fieldNames)
val differences = for {
field <- m1.keySet
if (!m1.get(field).equals(m2.get(field)))
} yield (field, m1(field), m2(field))
Returns Seq of tuples (field, value of Dfw1, value of Dfw1)
if they are different.
You may also use pattern matching on Row object to compare:
Dfw1 match {
case(id: String, desc: String, .) => // assuming you have the schema
// compare each value with Dfw2 and return differences
}