I have the following data:
| client_id | balance_month | balance |
|---|---|---|
| 100000000 | September 1,2021 12:00 AM | $7000 |
| 100000000 | August 1,2021 12:00 AM | $7000 |
| 100000000 | July 1,2021 12:00 AM | $6000 |
| 100000000 | June 1,2021 12:00 AM | $3000 |
| 100000000 | May 1,2021 12:00 AM | $20000 |
| 100000000 | April 1,2021 12:00 AM | $1000 |
| 100000000 | March 1,2021 12:00 AM | $0 |
| 200000000 | September 1,2021 12:00 AM | $7000 |
| 200000000 | August 1,2021 12:00 AM | $7000 |
| 200000000 | July 1,2021 12:00 AM | $8000 |
| 200000000 | June 1,2021 12:00 AM | $9000 |
| 200000000 | May 1,2021 12:00 AM | $1000 |
| 200000000 | April 1,2021 12:00 AM | $1000 |
| 200000000 | March 1,2021 12:00 AM | $0 |
I want to overwrite any month where a balance decreased with the most recent month that whose balance did not decrease. (Assume that the balances started March 2021, but want to automate for much longer)
| client_id | balance_month | balance |
|---|---|---|
| 100000000 | September 1,2021 12:00 AM | $7000 |
| 100000000 | August 1,2021 12:00 AM | $7000 |
| 100000000 | July 1,2021 12:00 AM | $6000 |
| 100000000 | June 1,2021 12:00 AM | $3000 |
| 100000000 | May 1,2021 12:00 AM | $1000 |
| 100000000 | April 1,2021 12:00 AM | $1000 |
| 100000000 | March 1,2021 12:00 AM | $0 |
| 200000000 | September 1,2021 12:00 AM | $7000 |
| 200000000 | August 1,2021 12:00 AM | $1000 |
| 200000000 | July 1,2021 12:00 AM | $1000 |
| 200000000 | June 1,2021 12:00 AM | $1000 |
| 200000000 | May 1,2021 12:00 AM | $1000 |
| 200000000 | April 1,2021 12:00 AM | $1000 |
| 200000000 | March 1,2021 12:00 AM | $0 |
Advertisement
Answer
Assuming all columns NOT NULL, or you have to do more.
SELECT client_id, balance_month
, COALESCE(balance_null, first_value(balance_null) OVER (PARTITION BY client_id, balance_grp ORDER BY balance_month)) AS new_balance
FROM (
SELECT *
, count(balance_null) OVER (PARTITION BY client_id ORDER BY balance_month) AS balance_grp
FROM (
SELECT client_id, balance_month
, CASE WHEN lead(balance) OVER (PARTITION BY client_id ORDER BY balance_month) < balance THEN NULL ELSE balance END AS balance_null
FROM tbl
ORDER BY client_id, balance_month
) sub1
) sub2;
db<>fiddle here
In subquery sub1, set balance to NULL if balance of the following row is smaller (your replace condition).
In subquery sub2, form groups for consecutive NULL values, including the leading nonnull value (balance_grp) using count(balance_null), because that omits NULL values from the count so that every NULL value falls into a group with the last valid (nonnull) value.
In the outer SELECT, replace NULL values with the leading nonnull value from each group. Voilá.
If balance can be assumed to be growing steadily (or at least stagnating) in all non-offending rows, we can use a simpler query:
SELECT client_id, balance_month
, COALESCE(balance_null, max(balance_null) OVER (PARTITION BY client_id ORDER BY balance_month))
FROM (
SELECT client_id, balance_month
, CASE WHEN lead(balance) OVER (PARTITION BY client_id ORDER BY balance_month) < balance THEN NULL ELSE balance END AS balance_null
FROM tbl
ORDER BY client_id, balance_month
) sub1;
Related: