Full Join using a fuzzy join but with distinct matching

Question

We have a table of expected payments and a table of payments made. We need to be able to match payments with a singular expected payment but we allow a +-3 day window for it to be made. On top of that, it should be a one-to-one match. So imagine I have a table of expected payments and payments The

Accepted Answer

..the same logic, as your current approach, iteration through expected dates and find any matching payment date which is greater than the last used payment date. It is a contrived example (based on the simplified requirement) and it might not fit well to the complete/full specs (lots of intricacies are hidden underneath the uniqueness of the dates.. do not get tricked just because the example executes fast)./*drop table if exists expectedgodrop table if exists paymentgo*/--expectedcreate table expected(expd date primary key clustered)godeclare @d date = isnull((select max(expd) from expected), '20100101')begin try insert into expected(expd) select dateadd(day, abs(checksum(newid()))%5, @d)end trybegin catch --ignore errors, .... just a couple sample rowsend catchgo 5000------------------paymentcreate table payment(payd date primary key clustered)godeclare @d date = isnull((select max(payd) from payment), '20100101')begin try insert into payment(payd) select dateadd(day, abs(checksum(newid()))%10, @d)end trybegin catch --ignore errors, .... just a couple sample rowsend catchgo 5000--max recursion == number of rows in expected - 1 (1 is the anchor)declare @maxrecursion int = (select count(*) from expected)-1 /* -1==anchor*/;--maximum difference between consecutive days in expected ..--this is not needed if there is sequential column (with no breaks) in expected which could be used for the recursion--eg rowA has id=1, rowB has id=2, rowC has id=3....rowX has id=100...recursion = recursive id + 1declare @maxexpdaysdiff int = ( select max(daysdiff) from ( select datediff(day, expd, lead(expd, 1, expd) over(order by expd)) as daysdiff from expected ) as ed);--recursion, rows of expectedwith cteas( --mimimum expected date and any matching payment date select top (1) e.expd, lead(e.expd, 1) over(order by e.expd) as recexpd, pp.payd, pp.payd as lastusedpayd, 1 as rownum from expected as e outer apply(select top (1) p.payd from payment as p where p.payd >= e.expd and p.payd < dateadd(day, 3+1, e.expd) order by p.payd) as pp order by e.expd union all --recursive part (next row of expected and any matching payment which was not assigned as lastusedpayd select cte.recexpd, lexp.leadexpd, mp.payd, isnull(mp.payd, cte.lastusedpayd), cte.rownum+1 from cte --cte carries next expected date::recexpd, this is not needed if there is another/sequential column to be used --next expected row/date for the recursion outer apply ( --get the next expected date, after cte.recexpd to continue the recursion (like using lead() in recursion) select l.leadexpd from ( --rows between cte.recexpd and cte.recexpd+@maxexpdaysdiff (there is always a row..except for the last expected date) select e.expd as leadexpd, row_number() over (order by e.expd) as ern from expected as e where e.expd > cte.recexpd and e.expd <= dateadd(day, @maxexpdaysdiff, cte.recexpd) ) as l where l.ern = 1 ) as lexp --matching payment, if any outer apply ( select lmp.payd from ( --payments between cte.recexpd and cte.recexpd + 3 days --..but exclude all payments before the payment which was lastly used:: cte.lastusedpayd select p.payd, row_number() over(order by p.payd) as prn from payment as p where p.payd >= cte.recexpd and p.payd < dateadd(day, 3+1, cte.recexpd) and (p.payd > cte.lastusedpayd or cte.lastusedpayd is null) ) as lmp where lmp.prn=1 ) as mp where cte.rownum <= @maxrecursion--safeguard recursion. )select expd, payd --,*from cteoption(maxrecursion 0);EDIT :Another approach (educational), would be to store the payment date which was assigned/selected for a previous row/expected date and use that value when selecting the payment date for each new expected date.In short:SELECT expecteddate –> SELECT paymentdate WHERE paymentdate >last_used_payment_date –> SET last_used_payment_date = SELECTedpaymentdate (if any)Although the logic is identical to the recursive approach, it can be implemented without an explicit tsql recursive cte, but just with a normal select and a scalar function.If you are familiar with (or ever heard of) the “quirky update” then the following could potentially be a “quirky select”.--the scalar functioncreate or alter function dbo.lastusedpaymentdate(@date date = null)returns dateasbegin if @date is null begin return (cast(session_context(N'paymentdate') as date)); end exec sp_set_session_context @key=N'paymentdate', @value=@date; return(@date);endgo--the query:--reset session variable for each execution <-- this is not really needed for a single execution of the select in a sessionexec sp_set_session_context @key=N'paymentdate', @value=null;--..a selectselect e.expd, p.payd, dbo.lastusedpaymentdate(p.payd) as lastusedpaymentdatefrom expected as e with(index(1))outer apply( select top (1) pm.payd from payment as pm where pm.payd >= e.expd and pm.payd < dateadd(day, 3+1, e.expd) and pm.payd > isnull(dbo.lastusedpaymentdate(default), '19000101') order by pm.payd) as porder by e.expd;select e.expd from expected ⇛ SELECT expecteddateouter apply(..pm.payd > isnull(dbo.lastusedpaymentdate(default), '19000101')) ⇛ SELECT paymentdate WHERE paymentdate > last_used_payment_dateselect ..dbo.lastusedpaymentdate(p.payd).. ⇛ SET last_used_payment_date = SELECTed paymentdate (if any)The quirkiness stems from the fact that the base/main table (i.e. expected dates) must be traversed in order, according to the business logic (earliest date first and all other dates next in order). This goes against the SQL fundamentals (declare what, not how) and order of execution cannot “be assumed/taken for granted”.For the simple example, the query selects with an apply(), the main table has a clustered index to presumably enforce order of retrieval..but still order of execution cannot be taken as ‘always guaranteed’. A more defensive approach would be to throw everything at the query…in an attempt to stop it from deviating from its “expected” behavior (as an exercise in futility) :...............from expected as e with(index(1), forcescan)outer apply(..........) as porder by e.expdoption(force order, maxdop 1, use hint ('DISABLE_BATCH_MODE_ADAPTIVE_JOINS'));Forcing an execution plan might be the best option if one had to exploit a volatile, ever changing, session context key.Nevertheless, the focal point of the approach is the simple scalar function, which stores&returns its parameter value in a session key when called with a non-null parameter or it (the function) returns the value of that session key when it’s called with a NULL parameter.For the retrieval of the payment date in the outer apply, the function is called with default/NULL (session key is not set), while in the select part, the function is called with p.payd as parameter. p.payd derives from an outer apply, if p.payd has a value (==there is a matching payment date) then the scalar function sets the session key, if there is no matching p.payd==NULL, then the scalar function simply returns the value of the session key –> synergy among simplistic tsql parts.

Advertisement

Answer