Skip to content
Advertisement

SQL Spark – Lag vs first row by Group

I’m SQL newbie and I’m trying to calculate difference between the averages. I want for each item and year calculate difference between months, but I want always substract current average – fist month of this year/item.
So for example, Item A, 2020, month =3 will have output = 0.6 because 3.7 – 3.1 (as this is the value for first month of this item) – is this correct approach? or maybe lag fuction is the wrong solution?

So far I was trying this one, but it calculates the difference vs previous row instead always using first month row.

TABLE:

Advertisement

Answer

If I follow you correctly, you want window function FIRST_VALUE() instead of LAG():

This compares the average of each row against the value of the first month of the same item and year.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement