I am trying to use a window function to get a value from the previous partition/window frame (rather than from some other row in the same partition). I don’t fully understand how I can do this but have tried various combinations of using:
- partition clause but with LAG() function
- omitting partition clause and using an ORDER BY + a RANGE 1 PRECEDING specification
Neither give me the expected result though.
My example is made ever so slightly more complicated by the fact that the value belongs to the “month” rather than a row so I don’t want to do any aggregation on that value. Thankfully, BigQuery has ANY_VALUE()
for this.
My data looks like this:
Year | Month | Date | MonthValue |
---|---|---|---|
2022 | 10 | 2022-10-30 | 50 |
2022 | 10 | 2022-10-31 | 50 |
2022 | 11 | 2022-11-01 | 60 |
2022 | 11 | 2022-11-01 | 60 |
The expected result is:
Year | Month | Date | MonthValue | PreviousMonthValue |
---|---|---|---|---|
2022 | 10 | 2022-10-30 | 50 | NULL |
2022 | 10 | 2022-10-31 | 50 | NULL |
2022 | 11 | 2022-11-01 | 60 | 50 |
2022 | 11 | 2022-11-01 | 60 | 50 |
Assuming it’s not possible to access values from a different partition/window frame if you use a PARTITION BY clause, I thought the RANGE 1 PRECEDING approach had the best mileage but the challenge with this is you must ORDER BY a single column that is numeric – and my understanding is your choice of column determines what RANGE means. I need to order by “year-month” and expressing this as an integer is tricky. I tried using CAST(FORMAT_DATE("%Y%m%, Date) AS INT64)
but I think the problem with this is it bases range’s granularity off “Date” which is too granular. I then tried (Year * 100) + Month
(which would be logically equivelant to (CONCAT(CAST(Year AS STRING), RIGHT(CONCAT("0", CAST(Month AS STRING)), 2)) AS INT64)
but this didn’t work either.
ANY_VALUE(MonthlyValue) OVER (ORDER BY (`Year` * 100) + `Month` RANGE 1 PRECEDING) AS PreviousMonthlyValue
I can think of an alternative way of doing this with a self-join where I join to the previous month but I want to achieve it with a window function.
Thanks in advance!
Advertisement
Answer
Consider below
select *, last_value(MonthValue) over prev_month as PreviousMonthValue from your_table window prev_month as (order by 12 * year + month range between 1 preceding and 1 preceding)
if applied to sample data in your question – output is
Above assumes all monthValue are same within the month as it is in your sample data – but most likely it is not a case and you want the value at a last day of month – so use below in this case
with endMonthValues as ( select Year, Month, array_agg(MonthValue order by Date desc limit 1)[offset(0)] MonthValue from your_table group by Year, Month ) select a.*, b.MonthValue as PreviousMonthValue from your_table a left join endMonthValues b on 12 * a.year + a.month - 1 = 12 * b.year + b.month