Row comparison in table via SQL

Question

I have a table which is structured like the following: Is there a way to build a SQL query which – per each ID – looks for the Day in which Value1 OR Value2 has changed? The result I would like to achieve would be this: In which I can keep track of those changes per ID per Day. Edit:

Accepted Answer

IIUC you can create a window over ID and order by day to get the desired output aswindow = Window.partitionBy('ID').orderBy('Day').rowsBetween(Window.currentRow-1,Window.currentRow)df = df.select('ID','Day',f.collect_set('Value1').over(window).alias('value1'),f.collect_set('Value2').over(window).alias('value2')).filter((f.size('value1')>1) | (f.size('value2')>1))df.show()+---+--------+----------+----------+| ID|     Day|    value1|    value2|+---+--------+----------+----------+| 20|20200602|     [ABC]| [100, 50]|| 10|20200603|[CDE, ABC]|[100, 200]|| 10|20200604|     [CDE]|[100, 200]|+---+--------+----------+----------+

Advertisement

Answer