I have the following data:
client_id | balance_month | balance |
---|---|---|
100000000 | September 1,2021 12:00 AM | $7000 |
100000000 | August 1,2021 12:00 AM | $7000 |
100000000 | July 1,2021 12:00 AM | $6000 |
100000000 | June 1,2021 12:00 AM | $3000 |
100000000 | May 1,2021 12:00 AM | $20000 |
100000000 | April 1,2021 12:00 AM | $1000 |
100000000 | March 1,2021 12:00 AM | $0 |
200000000 | September 1,2021 12:00 AM | $7000 |
200000000 | August 1,2021 12:00 AM | $7000 |
200000000 | July 1,2021 12:00 AM | $8000 |
200000000 | June 1,2021 12:00 AM | $9000 |
200000000 | May 1,2021 12:00 AM | $1000 |
200000000 | April 1,2021 12:00 AM | $1000 |
200000000 | March 1,2021 12:00 AM | $0 |
I want to overwrite any month where a balance decreased with the most recent month that whose balance did not decrease. (Assume that the balances started March 2021, but want to automate for much longer)
client_id | balance_month | balance |
---|---|---|
100000000 | September 1,2021 12:00 AM | $7000 |
100000000 | August 1,2021 12:00 AM | $7000 |
100000000 | July 1,2021 12:00 AM | $6000 |
100000000 | June 1,2021 12:00 AM | $3000 |
100000000 | May 1,2021 12:00 AM | $1000 |
100000000 | April 1,2021 12:00 AM | $1000 |
100000000 | March 1,2021 12:00 AM | $0 |
200000000 | September 1,2021 12:00 AM | $7000 |
200000000 | August 1,2021 12:00 AM | $1000 |
200000000 | July 1,2021 12:00 AM | $1000 |
200000000 | June 1,2021 12:00 AM | $1000 |
200000000 | May 1,2021 12:00 AM | $1000 |
200000000 | April 1,2021 12:00 AM | $1000 |
200000000 | March 1,2021 12:00 AM | $0 |
Advertisement
Answer
Assuming all columns NOT NULL
, or you have to do more.
SELECT client_id, balance_month , COALESCE(balance_null, first_value(balance_null) OVER (PARTITION BY client_id, balance_grp ORDER BY balance_month)) AS new_balance FROM ( SELECT * , count(balance_null) OVER (PARTITION BY client_id ORDER BY balance_month) AS balance_grp FROM ( SELECT client_id, balance_month , CASE WHEN lead(balance) OVER (PARTITION BY client_id ORDER BY balance_month) < balance THEN NULL ELSE balance END AS balance_null FROM tbl ORDER BY client_id, balance_month ) sub1 ) sub2;
db<>fiddle here
In subquery sub1
, set balance
to NULL if balance
of the following row is smaller (your replace condition).
In subquery sub2
, form groups for consecutive NULL values, including the leading nonnull value (balance_grp
) using count(balance_null)
, because that omits NULL values from the count so that every NULL value falls into a group with the last valid (nonnull) value.
In the outer SELECT
, replace NULL values with the leading nonnull value from each group. Voilá.
If balance
can be assumed to be growing steadily (or at least stagnating) in all non-offending rows, we can use a simpler query:
SELECT client_id, balance_month , COALESCE(balance_null, max(balance_null) OVER (PARTITION BY client_id ORDER BY balance_month)) FROM ( SELECT client_id, balance_month , CASE WHEN lead(balance) OVER (PARTITION BY client_id ORDER BY balance_month) < balance THEN NULL ELSE balance END AS balance_null FROM tbl ORDER BY client_id, balance_month ) sub1;
Related: