I have a table which has all order information.
Id is the unique key for the table and it’s grouped by Order_Id (every order for the same product has the same Order_Id). The main thing is that if order is cancelled, it is recorded as a cancelled order (Cancelled? = True) and for following orders, it must have substitute orders. It can be an exact one-to-one match like Id 5 and 6 but it can also be one-to-many (Id 2,3, and 4), many-to-one, or many-to-many. As you can see, Id 1 and 7 are not part of cancelled/substitute match so should be excluded from the match.
My goal is to find cancelled/substitute match from the below table. It can be sql query or stored procedure. I was also thinking of having one another column – Parent_Id to record cancelled Id to substitue Id but it has to be updated from stored procedure as well.
Any ideas? Thanks.
Advertisement
Answer
Validating (all) combinations is potentially very hard as this is essentially a subset sum problem. If you can add some limitations, then it often gets easier.
The solution below has the following limitations:
- The order of the cancelled and substitute transactions within an
order_id
cannot be mixed. The solution adds up transactions within theorder_id
that come directly after the cancelled transaction in a running total (running sum). When the running total reaches the cancelled quantity, then the transactions are considered a match. Mixing the transaction order will mess up the running total and fail to find a matching quantity. - Within an
order_id
there cannot be more than one cancelled transaction. This would require a reset of the running total. This can be done, but will introduce more complexity to the solution.
Example of the mixed order I am referring to in limitation 1.
order id cancelled quantity ----- -- --------- -------- 1 1 yes 100 --> 1 is cancelled 1 2 no 50 --> 2 is unrelated 1 3 no 100 --> 3 is the substitute for 1, but it does not come directly after 1...
Sample data
The imposed limitations are valid for the sample data: all substitute transactions come directly after the cancelled transaction and there are no multiple cancellations within a single order_id
!
create table transactions ( order_id int, id int, quantity int, cancelled_date date, created_date date, cancelled bit ); insert into transactions (order_id, id, quantity, cancelled_date, created_date, cancelled) values (100000, 1, 100, null , '2020-10-10', 0), (100000, 2, 200, '2020-10-11', '2020-10-10', 1), (100000, 3, 50, null , '2020-10-12', 0), (100000, 4, 150, null , '2020-10-12', 0), (100001, 5, 300, '2020-10-12', '2020-10-11', 1), (100001, 6, 300, null , '2020-10-13', 0), (100001, 7, 50, null , '2020-10-14', 0);
Solution
To see the running total in the results and better understand the final solution you can run this query.
with cte_cancel as ( select t.order_id, t.id, -- using id to get transaction order (alternative would be created_date, but then what if an order is created and cancelled on the same day?) t.quantity from transactions t where t.cancelled = 1 ) select c.order_id, c.id as cancelled_id, c.quantity, t.id as substitute_id, t.quantity, sum(t.quantity) over(partition by t.order_id order by t.id rows between unbounded preceding and current row) as qty_sum, case when sum(t.quantity) over(partition by t.order_id order by t.id rows between unbounded preceding and current row) <= c.quantity then c.id end as parent_order_id from cte_cancel c join transactions t on t.order_id = c.order_id where t.cancelled = 0 and t.id > c.id order by c.id, t.id;
This produces:
order_id cancelled_id quantity substitute_id quantity qty_sum parent_order_id -------- ------------ -------- ------------- -------- ------- --------------- 100000 2 200 3 50 50 2 100000 2 200 4 150 200 2 100001 5 300 6 300 300 5 100001 5 300 7 50 350 null
A minimal version of the solution that gives you just the matches would be the following:
with cte_cancel as ( select t.order_id, t.id, t.quantity from transactions t where t.cancelled = 1 ), cte_match as ( select c.order_id, c.id as cancelled_id, t.id as substitute_id, case when sum(t.quantity) over(partition by t.order_id order by t.id rows between unbounded preceding and current row) <= c.quantity then c.id end as parent_order_id from cte_cancel c join transactions t on t.order_id = c.order_id where t.cancelled = 0 and t.id > c.id ) select m.order_id, m.cancelled_id, m.substitute_id from cte_match m where m.parent_order_id is not null order by m.order_id, m.cancelled_id;
Resulting in:
order_id cancelled_id substitute_id -------- ------------ ------------- 100000 2 3 100000 2 4 100001 5 6
A version with related quantities and dates could be this:
with cte_cancel as ( select t.order_id, t.id, t.cancelled_date, t.quantity from transactions t where t.cancelled = 1 ), cte_match as ( select c.order_id, c.id as cancelled_id, c.quantity as cancelled_qty, c.cancelled_date, t.id as substitute_id, t.quantity as substitute_qty, t.created_date as substitute_date, case when sum(t.quantity) over(partition by t.order_id order by t.id rows between unbounded preceding and current row) <= c.quantity then c.id end as parent_order_id from cte_cancel c join transactions t on t.order_id = c.order_id where t.cancelled = 0 and t.id > c.id ) select m.order_id, m.cancelled_id, m.cancelled_qty, m.cancelled_date, m.substitute_id, m.substitute_qty, m.substitute_date from cte_match m where m.parent_order_id is not null order by m.order_id, m.cancelled_id;
Resulting in:
order_id cancelled_id cancelled_qty cancelled_date substitute_id substitute_qty substitute_date -------- ------------ ------------- -------------- ------------- -------------- --------------- 100000 2 200 2020-10-11 3 50 2020-10-12 100000 2 200 2020-10-11 4 150 2020-10-12 100001 5 300 2020-10-12 6 300 2020-10-13
Fiddle to see everything in action.