I have a Dataframe 1, Df1, Dataframe 2 , Df2 – Same Schema
I have Row 1 from Df1 – Dfw1, Row 1 from Df2 – Dfw2
I need to compare both to get differences b/n Dfw1 and Dfw2 and get the differences out as collection (Map or something)
Advertisement
Answer
A simple solution would be to transform the Row objects to Map and then compare the values of the 2 Maps.
Something like in Scala:
val m1 = Dfw1.getValuesMap[AnyVal](Dfw1.schema.fieldNames) val m2 = Dfw2.getValuesMap[AnyVal](Dfw2.schema.fieldNames) val differences = for { field <- m1.keySet if (!m1.get(field).equals(m2.get(field))) } yield (field, m1(field), m2(field))
Returns Seq of tuples (field, value of Dfw1, value of Dfw1)
if they are different.
You may also use pattern matching on Row object to compare:
Dfw1 match { case(id: String, desc: String, ....) => // assuming you have the schema // compare each value with Dfw2 and return differences }