I have a list of customers, and a bunch of them are duplicates (‘Acme Inc’, ‘Acme, Inc’, ‘Acme Inc.’, ‘Acme, Inc.’) They all have different IDs. BUT, each ID also has multiple addresses. Something like…
+-------+---------------+-------------------+-----------+---+-------+ |ID |Name |Address |City |St |Zip | +-------+---------------+-------------------+-----------+---+-------+ |001 |Acme Inc |123 Address St |Columbus |OH |43081 | |001 |Acme Inc |321 Street St |Columbus |OH |43081 | |001 |Acme Inc |456 Blanket Blvd |Columbus |OH |43081 | |002 |Acme, Inc |123 Babel St |Columbus |OH |43081 | |002 |Acme, Inc |321 Acorn Rd |Columbus |OH |43081 | |002 |Acme, Inc |456 Lancer Blvd |Columbus |OH |43081 | |003 |Baker |456 Blanket Blvd |Columbus |OH |43081 | |004 |Peterson |456 Blanket Blvd |Columbus |OH |43081 | |005 |Plumbers Inc |123 Address St |Columbus |OH |43081 | |006 |Plumbers, LLC |321 Street St |Columbus |OH |43081 | |007 |Acme, Inc. |123 Address St |Columbus |OH |43081 |
I have a function that normalizes the name so the first 6 would all be ‘Acme’ and the last two ‘Plumbers’.
What I want is a list of the ID and Name where there are duplicate. The goal is a report of records with unique IDs and duplicate names.
+-------+---------------+ |ID |Name | +-------+---------------+ |001 |Acme Inc | |002 |Acme, Inc | |007 |Acme, Inc. | |005 |Plumbers Inc | |006 |Plumbers, LLC |
I tried this:
SELECT DISTINCT [Name], ( SELECT strNew FROM [fn_strNorm](2, [Name]) ) AS [NewName] FROM [Processed_Vendors] WHERE [VendorID] <> '' AND [VendorID] IS NOT NULL AND [Name]<> '' AND [Name] IS NOT NULL GROUP BY [NewName] HAVING COUNT(*) > 1 ORDER BY [NewName]
I also tried to put them into [dump_names] table and join the two, but I keep getting multiple records from the same ID
SELECT pv.[VendorID], pv.[Name] FROM [dupe_names] n LEFT JOIN [Processed_Vendors] pv ON pv.[Name] = n.[Name] ORDER BY pv.[Name] SELECT 'Name Match' AS [Reason], pv.[VendorID], pv.[Name] FROM [dupe_names] n LEFT JOIN [Processed_Vendors] pv ON pv.[Name] = n.[Name] AND ( SELECT strNew FROM [dbo].[fn_strNorm](2, pv.[Name]) ) = n.[NewName] ORDER BY pv.[Name]
I think I’m overthinking this or the migraine I’m sporting is clouding my thinking. Either way, I appreciate the help.
Advertisement
Answer
One approach would use a CTE to find the distinct counts of name variants for each normalized name. Then, join to your current table and retain only records which had more than one name variant.
WITH cte AS ( SELECT [dbo].[fn_strNorm](2, Name) AS NmName, COUNT(DISTINCT Name) AS cnt FROM Processed_Vendors GROUP BY [dbo].[fn_strNorm](2, Name) ) SELECT DISTINCT pv.ID, pv.Name FROM Processed_Vendors pv INNER JOIN cte t ON t.NmName = [dbo].[fn_strNorm](2, pv.Name) WHERE t.cnt > 1;