Skip to content
Advertisement

Correct BigQuery SQL to get value from previous partition

I am trying to use a window function to get a value from the previous partition/window frame (rather than from some other row in the same partition). I don’t fully understand how I can do this but have tried various combinations of using:

  • partition clause but with LAG() function
  • omitting partition clause and using an ORDER BY + a RANGE 1 PRECEDING specification

Neither give me the expected result though.

My example is made ever so slightly more complicated by the fact that the value belongs to the “month” rather than a row so I don’t want to do any aggregation on that value. Thankfully, BigQuery has ANY_VALUE() for this.

My data looks like this:

Year Month Date MonthValue
2022 10 2022-10-30 50
2022 10 2022-10-31 50
2022 11 2022-11-01 60
2022 11 2022-11-01 60

The expected result is:

Year Month Date MonthValue PreviousMonthValue
2022 10 2022-10-30 50 NULL
2022 10 2022-10-31 50 NULL
2022 11 2022-11-01 60 50
2022 11 2022-11-01 60 50

Assuming it’s not possible to access values from a different partition/window frame if you use a PARTITION BY clause, I thought the RANGE 1 PRECEDING approach had the best mileage but the challenge with this is you must ORDER BY a single column that is numeric – and my understanding is your choice of column determines what RANGE means. I need to order by “year-month” and expressing this as an integer is tricky. I tried using CAST(FORMAT_DATE("%Y%m%, Date) AS INT64) but I think the problem with this is it bases range’s granularity off “Date” which is too granular. I then tried (Year * 100) + Month (which would be logically equivelant to (CONCAT(CAST(Year AS STRING), RIGHT(CONCAT("0", CAST(Month AS STRING)), 2)) AS INT64) but this didn’t work either.

I can think of an alternative way of doing this with a self-join where I join to the previous month but I want to achieve it with a window function.

Thanks in advance!

Advertisement

Answer

Consider below

if applied to sample data in your question – output is

enter image description here

Above assumes all monthValue are same within the month as it is in your sample data – but most likely it is not a case and you want the value at a last day of month – so use below in this case

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement