I am trying to add a ROW_NUMBER column to a data set that follows the order of [Withdrawal then Fee; Withdrawal then Fee] based upon whether the column NETWORKCD
is null. However, my row_number formula is still counting Nulls even though it’s not displaying the numbers, like so:
+--------+-------------+------------+-----------+-------------+
| Acctid | Transaction | PostDate | NetworkCd | PeriodCount |
+--------+-------------+------------+-----------+-------------+
| 12345 | Withdrawal | 10/4/2018 | FRGN | 1 |
| 12345 | Fee | 10/4/2018 | | |
| 12345 | Withdrawal | 10/11/2018 | FRGN | 3 |
| 12345 | Fee | 10/11/2018 | | |
| 12345 | Withdrawal | 10/22/2018 | FRGN | 5 |
| 12345 | Fee | 10/22/2018 | | |
+--------+-------------+------------+-----------+-------------+
I am using the following formula for PeriodCount
(case when networkcd is not null
then row_number() over (partition by acctid order by postdate)
end) PeriodCount
I was expecting PeriodCount to count [1,2,3] instead of [1,3,5], like so:
+--------+-------------+------------+-----------+-------------+
| Acctid | Transaction | PostDate | NetworkCd | PeriodCount |
+--------+-------------+------------+-----------+-------------+
| 12345 | Withdrawal | 10/4/2018 | FRGN | 1 |
| 12345 | Fee | 10/4/2018 | | |
| 12345 | Withdrawal | 10/11/2018 | FRGN | 2 |
| 12345 | Fee | 10/11/2018 | | |
| 12345 | Withdrawal | 10/22/2018 | FRGN | 3 |
| 12345 | Fee | 10/22/2018 | | |
+--------+-------------+------------+-----------+-------------+
What am I missing?
Advertisement
Answer
It is still showing a value the 1st, 3rd and 5th rows – so the row count as it encounters each of those rows is correct. It’s the row number within the entire result set, not within the not-null values. Your case expression is deciding whether to show the value, not affecting how it is found.
You could get the result you want using dense_rank()
instead;
with your_table (Acctid, Transaction, PostDate, NetworkCd) as (
select 12345, 'Withdrawal', to_date('10/4/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/4/2018', 'MM/DD/YYYY'), null from dual
union all select 12345, 'Withdrawal', to_date('10/11/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/11/2018', 'MM/DD/YYYY'), null from dual
union all select 12345, 'Withdrawal', to_date('10/22/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/22/2018', 'MM/DD/YYYY'), null from dual
)
select Acctid, Transaction, PostDate, NetworkCd,
(case when networkcd is not null then dense_rank() over (partition by acctid order by postdate) end) as periodcount
from your_table;
ACCTID TRANSACTIO POSTDATE NETW PERIODCOUNT
---------- ---------- ---------- ---- -----------
12345 Withdrawal 2018-10-04 FRGN 1
12345 Fee 2018-10-04
12345 Withdrawal 2018-10-11 FRGN 2
12345 Fee 2018-10-11
12345 Withdrawal 2018-10-22 FRGN 3
12345 Fee 2018-10-22
… as that does suppress gaps in the generated values. From the docs:
DENSE_RANK
computes the rank of a row in an ordered group of rows and returns the rank as aNUMBER
. The ranks are consecutive integers beginning with 1. …