I am working with SQL Server 2017, and I need to clean up duplicate rows and update all rows in other tables that contain my field.
I’ve got one table which contains my customers
USERID - Username C79784F1-7254-4195-AF7F-66E651F3C995 | Robert 3C51AD27-21F1-4751-9931-7C66263B4708 | Robert 0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | Bob 70A9552A-028B-4EA0-A309-4E93EEAB92E8 | William 1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78 | William 411BCC56-A4C9-4D9B-9D49-FA9255ECA968 | William F0223C57-E3B2-4F94-9820-2D9A62A515D6 | Cathy CREATE TABLE [dbo].[Users] ( [UserID] [uniqueidentifier] NOT NULL, [UserName] [nvarchar](260) NULL ); INSERT INTO [dbo].[Users] (userid, username) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995','Robert'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708','Robert'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315','Bob'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8','William'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78','William'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968','William'); INSERT INTO [dbo].[Users] (userid, username) VALUES ('F0223C57-E3B2-4F94-9820-2D9A62A515D6','Cathy');
Then I have 7 tables that contains the userid
column and 1 table with another name column
CreatedById - CreationDate - Folders C79784F1-7254-4195-AF7F-66E651F3C995 | 2018-02-24 | Folder1 3C51AD27-21F1-4751-9931-7C66263B4708 | 2019-10-12 | PAD 0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | 2021-05-12 | IEF 70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-27 | WIP 1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78 | 2021-06-29 | OLD_ONE 411BCC56-A4C9-4D9B-9D49-FA9255ECA968 | 2021-01-21 | ToTest CREATE TABLE [dbo].[catalog] ( [CreatedById] [uniqueidentifier] NOT NULL, [CreationDate] DATE NOT NULL, [Folders] [nvarchar](425) ); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995','2018-02-24','Folder1'); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708','2019-10-12','PAD'); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315','2021-05-12','IEF'); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8','2021-01-27','WIP'); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78','2021-06-29','OLD_ONE'); INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968','2021-01-21','ToTest');
My other tables:
CREATE TABLE table3 ([USERID] [uniqueidentifier] NOT NULL); CREATE TABLE table4 ([USERID] [uniqueidentifier] NOT NULL); CREATE TABLE table5 ([USERID] [uniqueidentifier] NOT NULL); CREATE TABLE table6 ([USERID] [uniqueidentifier] NOT NULL); INSERT INTO table3 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995'); INSERT INTO table3 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708'); INSERT INTO table3 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315'); INSERT INTO table3 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8'); INSERT INTO table3 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78'); INSERT INTO table3 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968'); INSERT INTO table4 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995'); INSERT INTO table4 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708'); INSERT INTO table4 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315'); INSERT INTO table4 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8'); INSERT INTO table4 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78'); INSERT INTO table4 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968'); INSERT INTO table5 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995'); INSERT INTO table5 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708'); INSERT INTO table5 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315'); INSERT INTO table5 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8'); INSERT INTO table5 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78'); INSERT INTO table5 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968'); INSERT INTO table6 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995'); INSERT INTO table6 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708'); INSERT INTO table6 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315'); INSERT INTO table6 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8'); INSERT INTO table6 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78'); INSERT INTO table6 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968');
I want to clean the duplicates and keep only one record in the database.
First, I created a query that gives me only the duplicate rows and keeps only one record.
With this record, I’ll update table3, table4, table5, table6,
WITH singleUser AS ( SELECT a.UserName, a.UserID FROM (SELECT userid, Username, ROW_NUMBER() OVER (PARTITION BY username ORDER BY username ASC) AS rowNo, COUNT(*) OVER (PARTITION BY username) AS c FROM dbo.users WHERE 1 = 1 GROUP BY userid, Username) a WHERE 1 = 1 AND rowNo > 1 AND c = rowNo )
Then I created a query that gives me all the tables that contain my ‘Userid’ column.
This query will return: table3, table4, table5, table6
WITH tableToUpdate AS ( SELECT TABLE_CATALOG AS 'Bdd', TABLE_SCHEMA AS 'Schema', TABLE_NAME AS 'TableName', COLUMN_NAME AS 'ColumnName' FROM INFORMATION_SCHEMA.COLUMNS WHERE 1 = 1 AND CASE WHEN COLUMN_NAME = 'CreatedByID' THEN 1 WHEN COLUMN_NAME = 'UserID' THEN 1 ELSE 0 END = 1 )
And finally I created my merge query
MERGE INTO dbo.catalog c USING (SELECT u.UserID AS UserIDUsers, su.UserID AS UserIDSingleUser FROM dbo.Users u JOIN singleUser su ON su.Username = u.username WHERE 1 = 1) S ON c.CreatedByID = s.UserIDUsers WHEN MATCHED THEN UPDATE SET c.CreatedByID =S.UserIDSingleUser
My merge result:
CreatedById - CreationDate - Folders C79784F1-7254-4195-AF7F-66E651F3C995 | 2018-02-24 | Folder1 C79784F1-7254-4195-AF7F-66E651F3C995 | 2019-10-12 | PAD 0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | 2021-05-12 | IEF 70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-27 | WIP 70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-06-29 | OLD_ONE 70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-21 | ToTest
It works very well, but is there a way to automatize it ?
Actually I’ve created 8 queries, but only the merge section change.
Also, how can I remove duplicate rows in my dbo.users
table, after all fields have been updated?
Thank you for your help.
Advertisement
Answer
I came back to answer to my own question. After some days i finally did it.
beforehand I’ve created a table which comes from my CTE query (singleUser)
CREATE OR ALTER PROCEDURE dbo.mergeUserID AS DECLARE @tableName nvarchar(50) DECLARE @sql nvarchar(max) DECLARE @columnName nvarchar(50) BEGIN DECLARE cursor_db CURSOR FOR SELECT TABLE_NAME AS 'TableName' ,COLUMN_NAME AS 'ColumnName' FROM INFORMATION_SCHEMA.COLUMNS WHERE 1=1 AND CASE WHEN COLUMN_NAME = 'CreatedByID' then 1 WHEN COLUMN_NAME = 'ModifiedByID' then 1 WHEN COLUMN_NAME = 'OwnerID'then 1 WHEN COLUMN_NAME = 'UserID' then 1 ELSE 0 END = 1 OPEN cursor_db FETCH NEXT FROM cursor_db INTO @tableName, @columnName WHILE @@FETCH_STATUS = 0 BEGIN SET @sql ='MERGE INTO ' + @tablename+ ' t USING ( SELECT u.UserID as UserIDUsers ,su.UserID as UserIDSingleUser FROM dbo.Users u JOIN dbo.singleUser su on su.UserName = u.username WHERE 1=1 )S ON t.'+@columnName+' = s.UserIDUsers WHEN MATCHED THEN UPDATE SET t.'+@columnName+' = S.UserIDSingleUser;' exec sp_executesql @sql PRINT @sql FETCH NEXT FROM cursor_db INTO @tableName, @columnName END CLOSE cursor_db DEALLOCATE cursor_db END; GO ------------------------------------------------ DECLARE @RC nvarchar(max) -- TODO: Set parameter values here. EXECUTE @RC = [dbo].[mergeUserID] PRINT @RC GO
I don’t know if it’s well coded because it’s the first time I’ve done this.
For example I’ve seen on some forum they put the ;
after FETCH / CLOSE / DEALLOCATE ;
others not.
so Who is right or wrong, dunno ?