Skip to content
Advertisement

Best way to find cancel and substitute match from the table

I have a table which has all order information.

Table

Id is the unique key for the table and it’s grouped by Order_Id (every order for the same product has the same Order_Id). The main thing is that if order is cancelled, it is recorded as a cancelled order (Cancelled? = True) and for following orders, it must have substitute orders. It can be an exact one-to-one match like Id 5 and 6 but it can also be one-to-many (Id 2,3, and 4), many-to-one, or many-to-many. As you can see, Id 1 and 7 are not part of cancelled/substitute match so should be excluded from the match.

My goal is to find cancelled/substitute match from the below table. It can be sql query or stored procedure. I was also thinking of having one another column – Parent_Id to record cancelled Id to substitue Id but it has to be updated from stored procedure as well.

Any ideas? Thanks.

Advertisement

Answer

Validating (all) combinations is potentially very hard as this is essentially a subset sum problem. If you can add some limitations, then it often gets easier.

The solution below has the following limitations:

  1. The order of the cancelled and substitute transactions within an order_id cannot be mixed. The solution adds up transactions within the order_id that come directly after the cancelled transaction in a running total (running sum). When the running total reaches the cancelled quantity, then the transactions are considered a match. Mixing the transaction order will mess up the running total and fail to find a matching quantity.
  2. Within an order_id there cannot be more than one cancelled transaction. This would require a reset of the running total. This can be done, but will introduce more complexity to the solution.

Example of the mixed order I am referring to in limitation 1.

order id cancelled quantity
----- -- --------- --------
1     1  yes       100     --> 1 is cancelled
1     2  no        50      --> 2 is unrelated
1     3  no        100     --> 3 is the substitute for 1, but it does not come directly after 1...

Sample data

The imposed limitations are valid for the sample data: all substitute transactions come directly after the cancelled transaction and there are no multiple cancellations within a single order_id!

create table transactions
(
  order_id int,
  id int,
  quantity int,
  cancelled_date date,
  created_date date,
  cancelled bit
);

insert into transactions (order_id, id, quantity, cancelled_date, created_date, cancelled) values
(100000, 1, 100, null        , '2020-10-10', 0),
(100000, 2, 200, '2020-10-11', '2020-10-10', 1),
(100000, 3,  50, null        , '2020-10-12', 0),
(100000, 4, 150, null        , '2020-10-12', 0),
(100001, 5, 300, '2020-10-12', '2020-10-11', 1),
(100001, 6, 300, null        , '2020-10-13', 0),
(100001, 7,  50, null        , '2020-10-14', 0);

Solution

To see the running total in the results and better understand the final solution you can run this query.

with cte_cancel as
(
  select t.order_id,
         t.id, -- using id to get transaction order (alternative would be created_date, but then what if an order is created and cancelled on the same day?)
         t.quantity
  from transactions t
  where t.cancelled = 1
)
select c.order_id,
       c.id as cancelled_id,
       c.quantity,
       t.id as substitute_id,
       t.quantity,
       sum(t.quantity) over(partition by t.order_id
                            order by t.id
                            rows between unbounded preceding and current row) as qty_sum,
       case when sum(t.quantity) over(partition by t.order_id
                                      order by t.id
                                      rows between unbounded preceding and current row) <= c.quantity
            then c.id end as parent_order_id
from cte_cancel c
join transactions t
  on t.order_id = c.order_id
where t.cancelled = 0
  and t.id > c.id
order by c.id, t.id;

This produces:

order_id cancelled_id quantity substitute_id quantity qty_sum parent_order_id
-------- ------------ -------- ------------- -------- ------- ---------------
100000   2            200      3             50       50      2
100000   2            200      4             150      200     2
100001   5            300      6             300      300     5
100001   5            300      7             50       350     null

A minimal version of the solution that gives you just the matches would be the following:

with cte_cancel as
(
  select t.order_id,
         t.id,
         t.quantity
  from transactions t
  where t.cancelled = 1
),
cte_match as
(
  select c.order_id,
         c.id as cancelled_id,
         t.id as substitute_id,
         case when sum(t.quantity) over(partition by t.order_id
                                        order by t.id
                                        rows between unbounded preceding and current row) <= c.quantity
              then c.id end as parent_order_id
  from cte_cancel c
  join transactions t
    on t.order_id = c.order_id
  where t.cancelled = 0
    and t.id > c.id
)
select m.order_id,
       m.cancelled_id,
       m.substitute_id
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
         m.cancelled_id;

Resulting in:

order_id cancelled_id substitute_id
-------- ------------ -------------
100000   2            3
100000   2            4
100001   5            6

A version with related quantities and dates could be this:

with cte_cancel as
(
  select t.order_id,
         t.id,
         t.cancelled_date,
         t.quantity
  from transactions t
  where t.cancelled = 1
),
cte_match as
(
  select c.order_id,
         c.id as cancelled_id,
         c.quantity as cancelled_qty,
         c.cancelled_date,
         t.id as substitute_id,
         t.quantity as substitute_qty,
         t.created_date as substitute_date,
         case when sum(t.quantity) over(partition by t.order_id
                                        order by t.id
                                        rows between unbounded preceding and current row) <= c.quantity
              then c.id end as parent_order_id
  from cte_cancel c
  join transactions t
    on t.order_id = c.order_id
  where t.cancelled = 0
    and t.id > c.id
)
select m.order_id,
       m.cancelled_id,
       m.cancelled_qty,
       m.cancelled_date,
       m.substitute_id,
       m.substitute_qty,
       m.substitute_date
from cte_match m
where m.parent_order_id is not null
order by m.order_id,
         m.cancelled_id;

Resulting in:

order_id cancelled_id cancelled_qty cancelled_date substitute_id substitute_qty substitute_date
-------- ------------ ------------- -------------- ------------- -------------- ---------------
100000   2            200           2020-10-11     3             50             2020-10-12
100000   2            200           2020-10-11     4             150            2020-10-12
100001   5            300           2020-10-12     6             300            2020-10-13

Fiddle to see everything in action.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement