Skip to content
Advertisement

Correct BigQuery SQL to get value from previous partition

I am trying to use a window function to get a value from the previous partition/window frame (rather than from some other row in the same partition). I don’t fully understand how I can do this but have tried various combinations of using:

  • partition clause but with LAG() function
  • omitting partition clause and using an ORDER BY + a RANGE 1 PRECEDING specification

Neither give me the expected result though.

My example is made ever so slightly more complicated by the fact that the value belongs to the “month” rather than a row so I don’t want to do any aggregation on that value. Thankfully, BigQuery has ANY_VALUE() for this.

My data looks like this:

Year Month Date MonthValue
2022 10 2022-10-30 50
2022 10 2022-10-31 50
2022 11 2022-11-01 60
2022 11 2022-11-01 60

The expected result is:

Year Month Date MonthValue PreviousMonthValue
2022 10 2022-10-30 50 NULL
2022 10 2022-10-31 50 NULL
2022 11 2022-11-01 60 50
2022 11 2022-11-01 60 50

Assuming it’s not possible to access values from a different partition/window frame if you use a PARTITION BY clause, I thought the RANGE 1 PRECEDING approach had the best mileage but the challenge with this is you must ORDER BY a single column that is numeric – and my understanding is your choice of column determines what RANGE means. I need to order by “year-month” and expressing this as an integer is tricky. I tried using CAST(FORMAT_DATE("%Y%m%, Date) AS INT64) but I think the problem with this is it bases range’s granularity off “Date” which is too granular. I then tried (Year * 100) + Month (which would be logically equivelant to (CONCAT(CAST(Year AS STRING), RIGHT(CONCAT("0", CAST(Month AS STRING)), 2)) AS INT64) but this didn’t work either.

ANY_VALUE(MonthlyValue) OVER (ORDER BY (`Year` * 100) + `Month` RANGE 1 PRECEDING) AS PreviousMonthlyValue

I can think of an alternative way of doing this with a self-join where I join to the previous month but I want to achieve it with a window function.

Thanks in advance!

Advertisement

Answer

Consider below

select *,
  last_value(MonthValue) over prev_month as PreviousMonthValue
from your_table
window prev_month as (order by 12 * year + month range between 1 preceding and 1 preceding)            

if applied to sample data in your question – output is

enter image description here

Above assumes all monthValue are same within the month as it is in your sample data – but most likely it is not a case and you want the value at a last day of month – so use below in this case

with endMonthValues as (
  select Year, Month, array_agg(MonthValue order by Date desc limit 1)[offset(0)] MonthValue
  from your_table
  group by Year, Month
)
select a.*, b.MonthValue as PreviousMonthValue
from your_table a
left join endMonthValues b
on 12 * a.year + a.month - 1 = 12 * b.year + b.month
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement