Skip to content
Advertisement

SQL Spark – Lag vs first row by Group

I’m SQL newbie and I’m trying to calculate difference between the averages. I want for each item and year calculate difference between months, but I want always substract current average – fist month of this year/item.
So for example, Item A, 2020, month =3 will have output = 0.6 because 3.7 – 3.1 (as this is the value for first month of this item) – is this correct approach? or maybe lag fuction is the wrong solution?

So far I was trying this one, but it calculates the difference vs previous row instead always using first month row.

SELECT item, 
       year, 
       month,  
       average,
       average - COALESCE(LAG(average,1) OVER(
           PARTITION BY item
           ORDER BY year,`month` ),`average`)
AS difference FROM xxx

TABLE:

+------+------+-------+---------+--------+
| item | year | month | average | wanted OUTPUT |
+------+------+-------+---------+--------+
| a    | 2020 | 1     | 3.1     | 0      |
+------+------+-------+---------+--------+
| a    | 2020 | 2     | 3.5     | 0.4    |
+------+------+-------+---------+--------+
| a    | 2020 | 3     | 3.7     | 0.6    |
+------+------+-------+---------+--------+
| a    | 2020 | 4     | 4.1     | 1.0    |
+------+------+-------+---------+--------+
| b    | 2020 | 1     | 2.0     | 0      |
+------+------+-------+---------+--------+
| b    | 2020 | 2     | 2.1     | 0.1    |
+------+------+-------+---------+--------+
| b    | 2020 | 3     | 2.5     | 0.5    |
+------+------+-------+---------+--------+
| b    | 2020 | 4     | 4.0     | 2.0    |
+------+------+-------+---------+--------+

Advertisement

Answer

If I follow you correctly, you want window function FIRST_VALUE() instead of LAG():

SELECT item, 
       year, 
       month,  
       average,
       average - FIRST_VALUE(average) OVER(PARTITION BY item, year ORDER BY month) AS difference 
FROM xxx

This compares the average of each row against the value of the first month of the same item and year.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement