I’ve got data from third party and imported to SQL server. The table has 255,072,636 records and 61,714,772 are unique among these records. The table has neither specific order nor any index. The table has 4 columns: Field1(float), Field2(varchar(255), Field3(varchar(255), Field4(varchar(255)
. I want to delete the duplicate record based upon Field1
for which I’ve run the following query:
WITH CTE(Field1, Field2, Field3, Field4, DuplicateCount) AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY Field1 ORDER BY Field1) AS DuplicateCount FROM MyTable) DELETE FROM CTE WHERE DuplicateCount > 1;
but it took 13 hours and then I cancelled it and the above mentioned figures are after that. Is there any better way to remove duplicates?
Advertisement
Answer
Thanks to “Kazi Mohammad Ali Nur” and “eshirvana”. I’ve combine there solutions. At first I created index on Field1
.
CREATE CLUSTERED INDEX Index_Name ON MyTable(Field1);
and then I executed following query to insert unique records into a new table and deleted original table.
WITH CTE(Field1, Field2, Field3, Field4, DuplicateCount) AS (SELECT *, ROW_NUMBER() OVER(PARTITION BY Field1 ORDER BY Field1) AS DuplicateCount FROM MyTable) select * into TempTable FROM CTE WHERE DuplicateCount = 1;
and it worked.
Thanks to all.