Say that I want to find every time that a client updated their budget.
Here’s what my data looks like
datetime, client_id, new_budget __________,__________,___________ 2022-01-01,1, ,100 2022-01-01,2, ,300 2022-01-02,1, ,80 2022-01-02,2, ,80
And the code I’ve run.
SELECT datetime AS dt_1, LAG(datetime) OVER (ORDER BY client_id, datetime) AS dt_2, client_id, new_budget FROM budget_table
What I’m expecting returned will be
dt_1, dt_2, client_id, new_budget __________,__________,__________,___________ 2022-01-01,NULL, 1 , 100 2022-01-02,2022-01-01,1 , 80 2022-01-01,NULL, 2 , 300 2022-01-02,2022-01-01,2 , 80
Hence there are NULL values for dt_2 in the first entry of each client_id. I’m not sure what code will accomplish this effect; is a GROUP BY clause will be necessary (or a partition over clause.)
But here’s the output to the SQL that I ran
dt_1, dt_2, client_id, new_budget __________,__________,__________,___________ 2022-01-01,NULL, 1 , 100 2022-01-02,2022-01-01,1 , 80 2022-01-01,2022-01-02,2 , 300 2022-01-02,2022-01-01,2 , 80
So the huge issue here is that it’s not recognizing that the dt_2 should be NULL if the previous row is from a different client_id.
Which syntax is recommended to accomplish this effect?
Advertisement
Answer
You need to partition by client_id
:
The
PARTITION BY
clause separates the input rows into different partitions. This is analogous to how theGROUP BY
clause separates rows into different groups for aggregate functions. IfPARTITION BY
is not specified, the entire input is treated as a single partition.
SELECT datetime AS dt_1, LAG(datetime) OVER (PARTITION BY client_id ORDER BY datetime) AS dt_2, client_id, new_budget FROM budget_table