Skip to content
Advertisement

Retrieve a row of data with the most recent date when another row ‘X’ in T-SQL

I have a database of customers who have an effective date and end date of their membership, both separate columns. The data is a bit dirty, however, and a customer can have multiple rows of data, only one of which is their most recent membership record. A member is considered “active” if they have an end date that = NULL.

The data looks somewhat like this:

Name         ID          Membership_Effective_Date     Membership_End_Date
---------------------------------------------------------------------------
Bob           1                 1/1/2020                           NULL
Bob           1                 1/1/2017                           1/2/2017
Bob           1                 1/1/2017                           9/1/2018
Kim           2                 1/1/2019                           1/1/2020
Kim           2                 1/1/2019                           12/31/2019
Susan         3                 1/1/2018                           12/31/2018
Susan         3                 1/1/2019                           1/1/2019
Larry         4                 1/1/2020                           1/1/2020

I need to retrieve the most recent membership end date for a list of customers that are both inactive and active.

My desired results should look like this:

Name ID Membership_Effective_Date Membership_End_Date Bob 1 1/1/2020 NULL Kim 2 1/1/2019 1/1/2020 Susan 3 1/1/2018 12/31/2018 Larry 4 1/1/2020 1/1/2020

I have been able to do this without a problem for customers that have both a row with a Membership_End_Date date value and a Membership_End_Date row with a NULL value (Bob), and customers that have multiple rows with only date values (Kim).

The challenge I am having is with data like Susan and Larry. They both have rows that contain date values where Membership_Effective_Date = Membership_End_Date. In Larry’s case that is the only row of data he has. And in Susan’s case the dates in the row where Membership_Effective_Date = Membership_End_Date is greater than the other row so my current query will pick it up automatically.

The problem is that I need to basically write a query that says if a customer has multiple rows of data and one row where Membership_Effective_Date = Membership_End_Date then chose the second most recent line of data. However, if a customer only has one row of data and that row only contains values where Membership_Effective_Date = Membership_End_Date then choose that one.

I can’t figure out how to do this without removing Larry from the data pull completely and I need to include him and similar customers.

Any help is appreciated!

Advertisement

Answer

You could do this with row_number() and a conditional sort:

select name, id, membership_effective_date, membership_end_date
from (
    select 
        t.*,
        row_number() over(
            partition by id 
            order by
                case when membership_end_date is null then 0 else 1 end,
                case when membership_end_date <> membership_effective_date then 0 else 1 end,
                membership_end_date desc
        ) rn
    from mytable t
) t
where rn = 1

The trick lies in the order by clause of row_number(): it gives priority to rows whose end date is null, then to rows whose end date is not equal to the start date, then to the greatest end date. You can run the subquery separately to see how the row number is assigned.

With this information at hand, all that is left to do is filter on the top ranked record per group.

Demo on DB Fiddle:

name  | id | membership_effective_date | membership_end_date
:---- | -: | :------------------------ | :------------------
Bob   |  1 | 2020-01-01                | null               
Kim   |  2 | 2019-01-01                | 2020-01-01         
Susan |  3 | 2018-01-01                | 2018-12-31         
Larry |  4 | 2020-01-01                | 2020-01-01         
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement