There are 3 tables in my DB; Table 1 has ‘Collateral’, Table 2 has ‘Loans’, Table 3 is a multi-link table between 1 and 2; let’s call it ‘Loan_Collateral_link’.
A collateral can belong to 1 or more Loans, and a Loan can have multiple Collaterals.
What I want to achieve, is create a separate result set (or table) in which I group together all Loans and Collaterals which are in any way linked to eachother through different Loans and/or Collaterals.
So you see Loan1, Loan2 and Loan5 are sharing col1, so I want them to be grouped. Col3 should be in that same group, as it is linked through Loan2.
Loan4 and Loan3 are linked through Col2, so they should be a separate group.
The resultset would be something as per below:
I’m trying to write a script, probably using a loop? to get this done. I was thinking to loop over each record in the Loan_Collateral_Link table, write every link that I find for the record into a temp_table. Then loop over this temp_table to find all linked records. However, I can’t really seem to work it out conceptually, as the loop should somehow reference itself.
--ForEach Loan_Collateral_Link Loan; --If not exists - Insert LOAN_ID into TempTable --If not exist - Insert COL_ID into TempTable --ForEach TempTable COL_ID; --Select * FROM Loan_Collateral_Link where Loan_ID or Col_ID matches --> If not exists - Insert LOAN_ID / COL_ID into TempTable
But this seems a bit like an infite loop perhaps?
Perhaps I should order all Loans, than go over Loan per Loan, add all the Col’s, and in the loop per loan select all other loans with this loan. Than go over all the new Cols and select all matching Loans. However, than I have new Loans, so I need to go back and fetch their possible Cols. So again, how many times do I loop? Use a flag per collateral and keep looping until all flags are fulfilled?
A while ago I wrote an answer to the question How to find all connected subgraphs of an undirected graph. Your question here looks like the same problem.
Essentially, it treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.
For the detailed explanation of the query see that answer. I adapted that query to your schema here.
Also, this problem can be solved in linear time. Wikipedia has description of the algorithm. But, the T-SQL method shown here is far from optimal. If you still want to do it in T-SQL you can try the following. Instead of trying to find all groups at once, pick a single starting Identifier (add
WHERE Ident='Loan1' to
CTE_Pairs), then delete from the big table all identifiers that were appended to this group. Repeat until the big table is empty. At least you could see the progress. Proper indexes on
Ident columns would not hurt as well.
DECLARE @T TABLE (LoanID varchar(50), CollateralID varchar(50)); INSERT INTO @T (LoanID, CollateralID) VALUES ('Loan1', 'Col1'), ('Loan2', 'Col1'), ('Loan2', 'Col3'), ('Loan3', 'Col2'), ('Loan4', 'Col2'), ('Loan5', 'Col1'), ('Loan5', 'Col3'), -- plus an extra group with some more interesting links ('Loan80', 'Col80'), ('Loan81', 'Col80'), ('Loan80', 'Col81'), ('Loan81', 'Col81'), ('Loan82', 'Col81') ;
WITH CTE_Idents AS ( SELECT LoanID AS Ident FROM @T UNION SELECT CollateralID AS Ident FROM @T ) ,CTE_Pairs AS ( SELECT LoanID AS Ident1, CollateralID AS Ident2 FROM @T UNION SELECT CollateralID AS Ident1, LoanID AS Ident2 FROM @T ) ,CTE_Recursive AS ( SELECT CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent , Ident1 , Ident2 , CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath , 1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1 UNION ALL SELECT CTE_Recursive.AnchorIdent , CTE_Pairs.Ident1 , CTE_Pairs.Ident2 , CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath , CTE_Recursive.Lvl + 1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1 WHERE CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000)) ) ,CTE_RecursionResult AS ( SELECT AnchorIdent, Ident1, Ident2 FROM CTE_Recursive ) ,CTE_CleanResult AS ( SELECT AnchorIdent, Ident1 AS Ident FROM CTE_RecursionResult UNION SELECT AnchorIdent, Ident2 AS Ident FROM CTE_RecursionResult ) SELECT CTE_Idents.Ident ,CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers ,DENSE_RANK() OVER(ORDER BY CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END ) AS GroupID FROM CTE_Idents CROSS APPLY ( SELECT CTE_CleanResult.Ident+',' FROM CTE_CleanResult WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE ) AS CA_XML(XML_Value) CROSS APPLY ( SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)') ) AS CA_Data(XML_Value) WHERE CTE_Idents.Ident IS NOT NULL ORDER BY GroupID, Ident;
+--------+-----------------------------------+---------+ | Ident | GroupMembers | GroupID | +--------+-----------------------------------+---------+ | Col1 | Col1,Col3,Loan1,Loan2,Loan5, | 1 | | Col3 | Col1,Col3,Loan1,Loan2,Loan5, | 1 | | Loan1 | Col1,Col3,Loan1,Loan2,Loan5, | 1 | | Loan2 | Col1,Col3,Loan1,Loan2,Loan5, | 1 | | Loan5 | Col1,Col3,Loan1,Loan2,Loan5, | 1 | | Col2 | Col2,Loan3,Loan4, | 2 | | Loan3 | Col2,Loan3,Loan4, | 2 | | Loan4 | Col2,Loan3,Loan4, | 2 | | Col80 | Col80,Col81,Loan80,Loan81,Loan82, | 3 | | Col81 | Col80,Col81,Loan80,Loan81,Loan82, | 3 | | Loan80 | Col80,Col81,Loan80,Loan81,Loan82, | 3 | | Loan81 | Col80,Col81,Loan80,Loan81,Loan82, | 3 | | Loan82 | Col80,Col81,Loan80,Loan81,Loan82, | 3 | +--------+-----------------------------------+---------+