Problem: I have a table of transactions (see below) with either (open/start) or (close/end) transaction in the format of date. The task is to merge these transactions with their corresponding dates, however, there can be also cases when the transaction is opened/started, but not closed/ended, in which case only Start Date must be shown, and the Flag ‘Y’ assigned. The rest of cases where there is an Enddate, the flag will be ‘N’. I have been able to set a flag depending on the values with first_value(Flag IGNORE NULLS) over(partition by Customer_ID order by EndDate desc), but Im having a bit harder time with matching the start with end dates.
Logic:
- There can be multiple start & end dates per one day with only time differences.
- There can be just one start & end date transaction per customer per day.
- There can be just start date (open) transaction per customer per day.
- Flag ‘N’ only for closed transactions, ‘Y’ for open i.e. no end date.
Current set example looks like this: Note: for this particular customer we have equal amount of transactions (inserts & deletes so I know the flag will be ‘N’, when we map start and enddates, but the solution also has to work when there are more start dates than enddates for other customers, where the final (newest) start date row will have a flag of ‘Y’.
Customer_Id | Start Date | End Date | Flag |
---|---|---|---|
111 | 02/07/2020 20:58:32.000000 | N | |
111 | 02/07/2020 19:18:04.000000 | N | |
111 | 01/06/2020 09:38:49.000000 | N | |
111 | 01/06/2020 09:36:34.000000 | N | |
111 | 29/05/2020 16:58:07.000000 | N | |
111 | 02/07/2020 20:57:52.000000 | N | |
111 | 02/07/2020 19:17:22.000000 | N | |
111 | 29/05/2020 16:58:06.000000 | N | |
111 | 01/06/2020 09:38:40.000000 | N | |
111 | 01/06/2020 09:36:34.000000 | N |
Expected result:
Customer_Id | Start Date | End Date | Flag |
---|---|---|---|
111 | 02/07/2020 20:57:52.000000 | 02/07/2020 20:58:32.000000 | N |
111 | 02/07/2020 19:17:22.000000 | 02/07/2020 19:18:04.000000 | N |
111 | 01/06/2020 09:38:40.000000 | 01/06/2020 09:38:49.000000 | N |
111 | 01/06/2020 09:36:34.00000 | 01/06/2020 09:36:34.000000 | N |
111 | 29/05/2020 16:58:06.000000 | 29/05/2020 16:58:07.000000 | N |
Partial solution that works for the above example:
select customer_id ,start_date ,lead(end_date) -- find the next row's end date over (partition by customer_id order by coalesce(start_date, end_date)) as new_end ,case when new_end is null then 'Y' else 'N' end as flag from tab qualify start_date is not null -- only return starting rows order by 1,2;
However, in case there are more start dates than end dates (e.g. several closed transactions, one open (can be not only the latest/newest one, but also somewhere in the middle..)) it will not work. E.g. in case a transaction is open: start date 12/01/2019 09:36:34.00000 but no end date then the lead() skips over this null enddate value and assigns the next available end date, leaving the solution looking like this:
Customer_Id | Start Date | End Date | Flag |
---|---|---|---|
111 | 02/07/2020 20:57:52.000000 | 02/07/2020 20:58:32.000000 | N |
111 | 02/07/2020 19:17:22.000000 | 02/07/2020 19:18:04.000000 | N |
111 | 01/06/2020 09:38:40.000000 | 01/06/2020 09:38:49.000000 | N |
111 | 12/01/2019 09:36:34.00000 | 18/11/2019 16:58:07.000000 | N |
111 | 18/11/2019 16:58:06.000000 | NULL | NULL |
When it should look like this:
Customer_Id | Start Date | End Date | Flag |
---|---|---|---|
111 | 02/07/2020 20:57:52.000000 | 02/07/2020 20:58:32.000000 | N |
111 | 02/07/2020 19:17:22.000000 | 02/07/2020 19:18:04.000000 | N |
111 | 01/06/2020 09:38:40.000000 | 01/06/2020 09:38:49.000000 | N |
111 | 12/01/2019 09:36:34.00000 | NULL | Y |
111 | 18/11/2019 16:58:06.000000 | 18/11/2019 16:58:07.000000 | N |
How to find a solution that fits both cases?
Advertisement
Answer
This will return the matching end date for each start date.
select customer_id ,start_date ,lead(end_date) -- find the next row's end date over (partition by customer_id order by coalesce(start_date, end_date)) as new_end ,case when new_end is null then 'Y' else 'N' end as flag from tab qualify start_date is not null -- only return starting rows order by 1,2;