I’m trying to generate a number based on a condition. When there is yes in column ‘Stop’ in the partition of a Client ordered by Start_Date, the Dense Rank has to start over. So I tried several things but it’s stil not what I want. My table with current number and expected number
+-----------+------------+------+------------+-------------+ | Client_No | Start_Date | Stop | Current_No | Expected_No | +-----------+------------+------+------------+-------------+ | 1 | 1-1-2018 | No | 1 | 1 | +-----------+------------+------+------------+-------------+ | 1 | 1-2-2018 | No | 2 | 2 | +-----------+------------+------+------------+-------------+ | 1 | 1-3-2018 | No | 3 | 3 | +-----------+------------+------+------------+-------------+ | 1 | 1-4-2018 | Yes | 1 | 1 | +-----------+------------+------+------------+-------------+ | 1 | 1-5-2018 | No | 4 | 2 | +-----------+------------+------+------------+-------------+ | 1 | 1-6-2018 | No | 5 | 3 | +-----------+------------+------+------------+-------------+ | 2 | 1-2-2018 | No | 1 | 1 | +-----------+------------+------+------------+-------------+ | 2 | 1-3-2018 | No | 2 | 2 | +-----------+------------+------+------------+-------------+ | 2 | 1-4-2018 | Yes | 1 | 1 | +-----------+------------+------+------------+-------------+ | 2 | 1-5-2018 | No | 3 | 2 | +-----------+------------+------+------------+-------------+ | 2 | 1-6-2018 | Yes | 2 | 1 | +-----------+------------+------+------------+-------------+
The query I used so far:
DENSE_RANK() OVER(PARTITION BY Client_No, Stop ORDER BY Start_Date ASC)
This seems not to be the solution because it counts onwart from the value ‘no’, but I don’t no how to handle this in another way.
Advertisement
Answer
One way to solve such a Gaps-And-Islands puzzle is to first calculate a rank that starts with the ‘Yes’ stops.
Then calculate the row_number or dense_rank also over that rank.
For example:
create table test ( Id int identity(1,1) primary key, Client_No int, Start_Date date, Stop varchar(3) )
insert into test (Client_No, Start_Date, Stop) values (1,'2018-01-01','No') ,(1,'2018-02-01','No') ,(1,'2018-03-01','No') ,(1,'2018-04-01','Yes') ,(1,'2018-05-01','No') ,(1,'2018-06-01','No') ,(2,'2018-02-01','No') ,(2,'2018-03-01','No') ,(2,'2018-04-01','Yes') ,(2,'2018-05-01','No') ,(2,'2018-06-01','Yes')
select * , row_number() over (partition by Client_no, Rnk order by start_date) as rn from ( select * , sum(case when Stop = 'Yes' then 1 else 0 end) over (partition by Client_No order by start_date) rnk from test ) q order by Client_No, start_date GOId | Client_No | Start_Date | Stop | rnk | rn -: | --------: | :------------------ | :--- | --: | :- 1 | 1 | 01/01/2018 00:00:00 | No | 0 | 1 2 | 1 | 01/02/2018 00:00:00 | No | 0 | 2 3 | 1 | 01/03/2018 00:00:00 | No | 0 | 3 4 | 1 | 01/04/2018 00:00:00 | Yes | 1 | 1 5 | 1 | 01/05/2018 00:00:00 | No | 1 | 2 6 | 1 | 01/06/2018 00:00:00 | No | 1 | 3 7 | 2 | 01/02/2018 00:00:00 | No | 0 | 1 8 | 2 | 01/03/2018 00:00:00 | No | 0 | 2 9 | 2 | 01/04/2018 00:00:00 | Yes | 1 | 1 10 | 2 | 01/05/2018 00:00:00 | No | 1 | 2 11 | 2 | 01/06/2018 00:00:00 | Yes | 2 | 1
db<>fiddle here
The difference between using this:
row_number() over (partition by Client_no, Rnk order by start_date)
versus this:
dense_rank() over (partition by Client_no, Rnk order by start_date)
is that the dense_rank would calculate the same number for the same start_date per Client_no & Rnk.