I am trying to add a ROW_NUMBER column to a data set that follows the order of [Withdrawal then Fee; Withdrawal then Fee] based upon whether the column NETWORKCD
is null. However, my row_number formula is still counting Nulls even though it’s not displaying the numbers, like so:
+--------+-------------+------------+-----------+-------------+ | Acctid | Transaction | PostDate | NetworkCd | PeriodCount | +--------+-------------+------------+-----------+-------------+ | 12345 | Withdrawal | 10/4/2018 | FRGN | 1 | | 12345 | Fee | 10/4/2018 | | | | 12345 | Withdrawal | 10/11/2018 | FRGN | 3 | | 12345 | Fee | 10/11/2018 | | | | 12345 | Withdrawal | 10/22/2018 | FRGN | 5 | | 12345 | Fee | 10/22/2018 | | | +--------+-------------+------------+-----------+-------------+
I am using the following formula for PeriodCount
(case when networkcd is not null then row_number() over (partition by acctid order by postdate) end) PeriodCount
I was expecting PeriodCount to count [1,2,3] instead of [1,3,5], like so:
+--------+-------------+------------+-----------+-------------+ | Acctid | Transaction | PostDate | NetworkCd | PeriodCount | +--------+-------------+------------+-----------+-------------+ | 12345 | Withdrawal | 10/4/2018 | FRGN | 1 | | 12345 | Fee | 10/4/2018 | | | | 12345 | Withdrawal | 10/11/2018 | FRGN | 2 | | 12345 | Fee | 10/11/2018 | | | | 12345 | Withdrawal | 10/22/2018 | FRGN | 3 | | 12345 | Fee | 10/22/2018 | | | +--------+-------------+------------+-----------+-------------+
What am I missing?
Advertisement
Answer
It is still showing a value the 1st, 3rd and 5th rows – so the row count as it encounters each of those rows is correct. It’s the row number within the entire result set, not within the not-null values. Your case expression is deciding whether to show the value, not affecting how it is found.
You could get the result you want using dense_rank()
instead;
with your_table (Acctid, Transaction, PostDate, NetworkCd) as ( select 12345, 'Withdrawal', to_date('10/4/2018', 'MM/DD/YYYY'), 'FRGN' from dual union all select 12345, 'Fee', to_date('10/4/2018', 'MM/DD/YYYY'), null from dual union all select 12345, 'Withdrawal', to_date('10/11/2018', 'MM/DD/YYYY'), 'FRGN' from dual union all select 12345, 'Fee', to_date('10/11/2018', 'MM/DD/YYYY'), null from dual union all select 12345, 'Withdrawal', to_date('10/22/2018', 'MM/DD/YYYY'), 'FRGN' from dual union all select 12345, 'Fee', to_date('10/22/2018', 'MM/DD/YYYY'), null from dual ) select Acctid, Transaction, PostDate, NetworkCd, (case when networkcd is not null then dense_rank() over (partition by acctid order by postdate) end) as periodcount from your_table; ACCTID TRANSACTIO POSTDATE NETW PERIODCOUNT ---------- ---------- ---------- ---- ----------- 12345 Withdrawal 2018-10-04 FRGN 1 12345 Fee 2018-10-04 12345 Withdrawal 2018-10-11 FRGN 2 12345 Fee 2018-10-11 12345 Withdrawal 2018-10-22 FRGN 3 12345 Fee 2018-10-22
… as that does suppress gaps in the generated values. From the docs:
DENSE_RANK
computes the rank of a row in an ordered group of rows and returns the rank as aNUMBER
. The ranks are consecutive integers beginning with 1. …