Skip to content
Advertisement

Full Join using a fuzzy join but with distinct matching

We have a table of expected payments and a table of payments made. We need to be able to match payments with a singular expected payment but we allow a +-3 day window for it to be made. On top of that, it should be a one-to-one match.

So imagine I have a table of expected payments

and payments

The result I want is

and if the payment of the 6th is removed the result would be

so the matching between the 5th and the payment on the 7th depends on whether the payment matched with the 4th. As does the matching between the 4th and 7th depend on if the 4th was matched with the 6th.

I’ve currently achieved this by doing a full join with the matching and then recursively iterating over it to clean out repeated records from both sides. unfortunately as the data in this case is in the 100s of millions of rows, it takes about 40 minutes to churn.

I’m wondering if there is a better way or a built in join that I’ve not come across to achieve this concept of distinct matching.

Advertisement

Answer

..the same logic, as your current approach, iteration through expected dates and find any matching payment date which is greater than the last used payment date. It is a contrived example (based on the simplified requirement) and it might not fit well to the complete/full specs (lots of intricacies are hidden underneath the uniqueness of the dates.. do not get tricked just because the example executes fast).

EDIT : Another approach (educational), would be to store the payment date which was assigned/selected for a previous row/expected date and use that value when selecting the payment date for each new expected date.

In short:

SELECT expecteddate –> SELECT paymentdate WHERE paymentdate > last_used_payment_date –> SET last_used_payment_date = SELECTed paymentdate (if any)

Although the logic is identical to the recursive approach, it can be implemented without an explicit tsql recursive cte, but just with a normal select and a scalar function.

If you are familiar with (or ever heard of) the “quirky update” then the following could potentially be a “quirky select”.

  1. select e.expd from expected ⇛ SELECT expecteddate
  2. outer apply(..pm.payd > isnull(dbo.lastusedpaymentdate(default), '19000101')) ⇛ SELECT paymentdate WHERE paymentdate > last_used_payment_date
  3. select ..dbo.lastusedpaymentdate(p.payd).. ⇛ SET last_used_payment_date = SELECTed paymentdate (if any)

The quirkiness stems from the fact that the base/main table (i.e. expected dates) must be traversed in order, according to the business logic (earliest date first and all other dates next in order). This goes against the SQL fundamentals (declare what, not how) and order of execution cannot “be assumed/taken for granted”.

For the simple example, the query selects with an apply(), the main table has a clustered index to presumably enforce order of retrieval..but still order of execution cannot be taken as ‘always guaranteed’. A more defensive approach would be to throw everything at the query…in an attempt to stop it from deviating from its “expected” behavior (as an exercise in futility) :

Forcing an execution plan might be the best option if one had to exploit a volatile, ever changing, session context key.

Nevertheless, the focal point of the approach is the simple scalar function, which stores&returns its parameter value in a session key when called with a non-null parameter or it (the function) returns the value of that session key when it’s called with a NULL parameter. For the retrieval of the payment date in the outer apply, the function is called with default/NULL (session key is not set), while in the select part, the function is called with p.payd as parameter. p.payd derives from an outer apply, if p.payd has a value (==there is a matching payment date) then the scalar function sets the session key, if there is no matching p.payd==NULL, then the scalar function simply returns the value of the session key –> synergy among simplistic tsql parts.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement