Skip to content
Advertisement

Conditional ROW_NUMBER() to skip nulls still counting them

I am trying to add a ROW_NUMBER column to a data set that follows the order of [Withdrawal then Fee; Withdrawal then Fee] based upon whether the column NETWORKCD is null. However, my row_number formula is still counting Nulls even though it’s not displaying the numbers, like so:

+--------+-------------+------------+-----------+-------------+
| Acctid | Transaction |  PostDate  | NetworkCd | PeriodCount |
+--------+-------------+------------+-----------+-------------+
|  12345 | Withdrawal  | 10/4/2018  | FRGN      |           1 |
|  12345 | Fee         | 10/4/2018  |           |             |
|  12345 | Withdrawal  | 10/11/2018 | FRGN      |           3 |
|  12345 | Fee         | 10/11/2018 |           |             |
|  12345 | Withdrawal  | 10/22/2018 | FRGN      |           5 |
|  12345 | Fee         | 10/22/2018 |           |             |
+--------+-------------+------------+-----------+-------------+

I am using the following formula for PeriodCount

(case when networkcd is not null 
      then row_number() over (partition by acctid order by postdate) 
 end) PeriodCount

I was expecting PeriodCount to count [1,2,3] instead of [1,3,5], like so:

+--------+-------------+------------+-----------+-------------+
| Acctid | Transaction |  PostDate  | NetworkCd | PeriodCount |
+--------+-------------+------------+-----------+-------------+
|  12345 | Withdrawal  | 10/4/2018  | FRGN      |           1 |
|  12345 | Fee         | 10/4/2018  |           |             |
|  12345 | Withdrawal  | 10/11/2018 | FRGN      |           2 |
|  12345 | Fee         | 10/11/2018 |           |             |
|  12345 | Withdrawal  | 10/22/2018 | FRGN      |           3 |
|  12345 | Fee         | 10/22/2018 |           |             |
+--------+-------------+------------+-----------+-------------+

What am I missing?

Advertisement

Answer

It is still showing a value the 1st, 3rd and 5th rows – so the row count as it encounters each of those rows is correct. It’s the row number within the entire result set, not within the not-null values. Your case expression is deciding whether to show the value, not affecting how it is found.

You could get the result you want using dense_rank() instead;

with your_table (Acctid, Transaction, PostDate, NetworkCd) as (
          select 12345, 'Withdrawal', to_date('10/4/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/4/2018', 'MM/DD/YYYY'), null from dual
union all select 12345, 'Withdrawal', to_date('10/11/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/11/2018', 'MM/DD/YYYY'), null from dual
union all select 12345, 'Withdrawal', to_date('10/22/2018', 'MM/DD/YYYY'), 'FRGN' from dual
union all select 12345, 'Fee', to_date('10/22/2018', 'MM/DD/YYYY'), null from dual
)
select Acctid, Transaction, PostDate, NetworkCd,
(case when networkcd is not null then dense_rank() over (partition by acctid order by postdate) end) as periodcount
from your_table;

    ACCTID TRANSACTIO POSTDATE   NETW PERIODCOUNT
---------- ---------- ---------- ---- -----------
     12345 Withdrawal 2018-10-04 FRGN           1
     12345 Fee        2018-10-04                 
     12345 Withdrawal 2018-10-11 FRGN           2
     12345 Fee        2018-10-11                 
     12345 Withdrawal 2018-10-22 FRGN           3
     12345 Fee        2018-10-22                 

… as that does suppress gaps in the generated values. From the docs:

DENSE_RANK computes the rank of a row in an ordered group of rows and returns the rank as a NUMBER. The ranks are consecutive integers beginning with 1. …

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement