I have a database of customers who have an effective date and end date of their membership, both separate columns. The data is a bit dirty, however, and a customer can have multiple rows of data, only one of which is their most recent membership record. A member is considered “active” if they have an end date that = NULL.
The data looks somewhat like this:
Name ID Membership_Effective_Date Membership_End_Date --------------------------------------------------------------------------- Bob 1 1/1/2020 NULL Bob 1 1/1/2017 1/2/2017 Bob 1 1/1/2017 9/1/2018 Kim 2 1/1/2019 1/1/2020 Kim 2 1/1/2019 12/31/2019 Susan 3 1/1/2018 12/31/2018 Susan 3 1/1/2019 1/1/2019 Larry 4 1/1/2020 1/1/2020
I need to retrieve the most recent membership end date for a list of customers that are both inactive and active.
My desired results should look like this:
Name ID Membership_Effective_Date Membership_End_Date
Bob 1 1/1/2020 NULL
Kim 2 1/1/2019 1/1/2020
Susan 3 1/1/2018 12/31/2018
Larry 4 1/1/2020 1/1/2020
I have been able to do this without a problem for customers that have both a row with a Membership_End_Date date value and a Membership_End_Date row with a NULL value (Bob), and customers that have multiple rows with only date values (Kim).
The challenge I am having is with data like Susan and Larry. They both have rows that contain date values where Membership_Effective_Date = Membership_End_Date. In Larry’s case that is the only row of data he has. And in Susan’s case the dates in the row where Membership_Effective_Date = Membership_End_Date is greater than the other row so my current query will pick it up automatically.
The problem is that I need to basically write a query that says if a customer has multiple rows of data and one row where Membership_Effective_Date = Membership_End_Date then chose the second most recent line of data. However, if a customer only has one row of data and that row only contains values where Membership_Effective_Date = Membership_End_Date then choose that one.
I can’t figure out how to do this without removing Larry from the data pull completely and I need to include him and similar customers.
Any help is appreciated!
Advertisement
Answer
You could do this with row_number()
and a conditional sort:
select name, id, membership_effective_date, membership_end_date from ( select t.*, row_number() over( partition by id order by case when membership_end_date is null then 0 else 1 end, case when membership_end_date <> membership_effective_date then 0 else 1 end, membership_end_date desc ) rn from mytable t ) t where rn = 1
The trick lies in the order by
clause of row_number()
: it gives priority to rows whose end date is null, then to rows whose end date is not equal to the start date, then to the greatest end date. You can run the subquery separately to see how the row number is assigned.
With this information at hand, all that is left to do is filter on the top ranked record per group.
name | id | membership_effective_date | membership_end_date :---- | -: | :------------------------ | :------------------ Bob | 1 | 2020-01-01 | null Kim | 2 | 2019-01-01 | 2020-01-01 Susan | 3 | 2018-01-01 | 2018-12-31 Larry | 4 | 2020-01-01 | 2020-01-01