I am working with SQL Server 2017, and I need to clean up duplicate rows and update all rows in other tables that contain my field.
I’ve got one table which contains my customers
USERID - Username
C79784F1-7254-4195-AF7F-66E651F3C995 | Robert
3C51AD27-21F1-4751-9931-7C66263B4708 | Robert
0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | Bob
70A9552A-028B-4EA0-A309-4E93EEAB92E8 | William
1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78 | William
411BCC56-A4C9-4D9B-9D49-FA9255ECA968 | William
F0223C57-E3B2-4F94-9820-2D9A62A515D6 | Cathy
CREATE TABLE [dbo].[Users]
(
[UserID] [uniqueidentifier] NOT NULL,
[UserName] [nvarchar](260) NULL
);
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995','Robert');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708','Robert');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315','Bob');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8','William');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78','William');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968','William');
INSERT INTO [dbo].[Users] (userid, username)
VALUES ('F0223C57-E3B2-4F94-9820-2D9A62A515D6','Cathy');
Then I have 7 tables that contains the userid
column and 1 table with another name column
CreatedById - CreationDate - Folders
C79784F1-7254-4195-AF7F-66E651F3C995 | 2018-02-24 | Folder1
3C51AD27-21F1-4751-9931-7C66263B4708 | 2019-10-12 | PAD
0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | 2021-05-12 | IEF
70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-27 | WIP
1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78 | 2021-06-29 | OLD_ONE
411BCC56-A4C9-4D9B-9D49-FA9255ECA968 | 2021-01-21 | ToTest
CREATE TABLE [dbo].[catalog]
(
[CreatedById] [uniqueidentifier] NOT NULL,
[CreationDate] DATE NOT NULL,
[Folders] [nvarchar](425)
);
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995','2018-02-24','Folder1');
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708','2019-10-12','PAD');
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315','2021-05-12','IEF');
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8','2021-01-27','WIP');
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78','2021-06-29','OLD_ONE');
INSERT INTO [dbo].[catalog] (CreatedById, CreationDate, Folders)
VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968','2021-01-21','ToTest');
My other tables:
CREATE TABLE table3 ([USERID] [uniqueidentifier] NOT NULL);
CREATE TABLE table4 ([USERID] [uniqueidentifier] NOT NULL);
CREATE TABLE table5 ([USERID] [uniqueidentifier] NOT NULL);
CREATE TABLE table6 ([USERID] [uniqueidentifier] NOT NULL);
INSERT INTO table3 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995');
INSERT INTO table3 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708');
INSERT INTO table3 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315');
INSERT INTO table3 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8');
INSERT INTO table3 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78');
INSERT INTO table3 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968');
INSERT INTO table4 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995');
INSERT INTO table4 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708');
INSERT INTO table4 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315');
INSERT INTO table4 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8');
INSERT INTO table4 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78');
INSERT INTO table4 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968');
INSERT INTO table5 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995');
INSERT INTO table5 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708');
INSERT INTO table5 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315');
INSERT INTO table5 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8');
INSERT INTO table5 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78');
INSERT INTO table5 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968');
INSERT INTO table6 (USERID) VALUES ('C79784F1-7254-4195-AF7F-66E651F3C995');
INSERT INTO table6 (USERID) VALUES ('3C51AD27-21F1-4751-9931-7C66263B4708');
INSERT INTO table6 (USERID) VALUES ('0D67A3E3-E7CF-4D95-935D-E077F4A6D315');
INSERT INTO table6 (USERID) VALUES ('70A9552A-028B-4EA0-A309-4E93EEAB92E8');
INSERT INTO table6 (USERID) VALUES ('1D8E9F5D-FEEB-43DA-9CDA-F22D610CDE78');
INSERT INTO table6 (USERID) VALUES ('411BCC56-A4C9-4D9B-9D49-FA9255ECA968');
I want to clean the duplicates and keep only one record in the database.
First, I created a query that gives me only the duplicate rows and keeps only one record.
With this record, I’ll update table3, table4, table5, table6,
WITH singleUser AS
(
SELECT
a.UserName,
a.UserID
FROM
(SELECT
userid,
Username,
ROW_NUMBER() OVER (PARTITION BY username ORDER BY username ASC) AS rowNo,
COUNT(*) OVER (PARTITION BY username) AS c
FROM
dbo.users
WHERE
1 = 1
GROUP BY
userid, Username) a
WHERE
1 = 1
AND rowNo > 1
AND c = rowNo
)
Then I created a query that gives me all the tables that contain my ‘Userid’ column.
This query will return: table3, table4, table5, table6
WITH tableToUpdate AS
(
SELECT
TABLE_CATALOG AS 'Bdd',
TABLE_SCHEMA AS 'Schema',
TABLE_NAME AS 'TableName',
COLUMN_NAME AS 'ColumnName'
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
1 = 1
AND CASE
WHEN COLUMN_NAME = 'CreatedByID' THEN 1
WHEN COLUMN_NAME = 'UserID' THEN 1
ELSE 0
END = 1
)
And finally I created my merge query
MERGE INTO dbo.catalog c
USING (SELECT
u.UserID AS UserIDUsers,
su.UserID AS UserIDSingleUser
FROM
dbo.Users u
JOIN
singleUser su ON su.Username = u.username
WHERE
1 = 1) S ON c.CreatedByID = s.UserIDUsers
WHEN MATCHED THEN
UPDATE
SET c.CreatedByID =S.UserIDSingleUser
My merge result:
CreatedById - CreationDate - Folders
C79784F1-7254-4195-AF7F-66E651F3C995 | 2018-02-24 | Folder1
C79784F1-7254-4195-AF7F-66E651F3C995 | 2019-10-12 | PAD
0D67A3E3-E7CF-4D95-935D-E077F4A6D315 | 2021-05-12 | IEF
70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-27 | WIP
70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-06-29 | OLD_ONE
70A9552A-028B-4EA0-A309-4E93EEAB92E8 | 2021-01-21 | ToTest
It works very well, but is there a way to automatize it ?
Actually I’ve created 8 queries, but only the merge section change.
Also, how can I remove duplicate rows in my dbo.users
table, after all fields have been updated?
Thank you for your help.
Advertisement
Answer
I came back to answer to my own question. After some days i finally did it.
beforehand I’ve created a table which comes from my CTE query (singleUser)
CREATE OR ALTER PROCEDURE dbo.mergeUserID
AS
DECLARE @tableName nvarchar(50)
DECLARE @sql nvarchar(max)
DECLARE @columnName nvarchar(50)
BEGIN
DECLARE cursor_db CURSOR FOR
SELECT
TABLE_NAME AS 'TableName'
,COLUMN_NAME AS 'ColumnName'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE 1=1
AND CASE
WHEN COLUMN_NAME = 'CreatedByID' then 1
WHEN COLUMN_NAME = 'ModifiedByID' then 1
WHEN COLUMN_NAME = 'OwnerID'then 1
WHEN COLUMN_NAME = 'UserID' then 1
ELSE 0
END = 1
OPEN cursor_db
FETCH NEXT FROM cursor_db INTO @tableName, @columnName
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql ='MERGE INTO '
+ @tablename+ ' t USING (
SELECT
u.UserID as UserIDUsers
,su.UserID as UserIDSingleUser
FROM dbo.Users u
JOIN dbo.singleUser su on su.UserName = u.username
WHERE 1=1
)S ON t.'+@columnName+' = s.UserIDUsers
WHEN MATCHED THEN
UPDATE
SET t.'+@columnName+' = S.UserIDSingleUser;'
exec sp_executesql @sql
PRINT @sql
FETCH NEXT FROM cursor_db INTO @tableName, @columnName
END
CLOSE cursor_db
DEALLOCATE cursor_db
END;
GO
------------------------------------------------
DECLARE @RC nvarchar(max)
-- TODO: Set parameter values here.
EXECUTE @RC = [dbo].[mergeUserID]
PRINT @RC
GO
I don’t know if it’s well coded because it’s the first time I’ve done this.
For example I’ve seen on some forum they put the ;
after FETCH / CLOSE / DEALLOCATE ;
others not.
so Who is right or wrong, dunno ?