I am working on building an application to assist with taking data from different sources dynamically Files and Emails (usually CSV and Excel), APIs, and other SQL Databases and processing them and moving them to a central SQL server. All the tables are being uploaded to the main SQL server and processed to insert new rows into the destination table and update rows with changed data if some is available. The main SQL server is a Microsoft SQL server.
When the data is uploaded to the main server for comparison it is being stored in a temporary table which is being dropped after the comparison is done. The statement that I am using created dynamically by the program in order to allow it to be dynamic to different datasets. What I have been using is a NOT EXISTS which when I run it on a table that is 380k+ rows of data it has been taking 2+ hours to process that data. I have also tried EXCEPT, however I am unable to use that as some of the tables contain text fields which can’t be used for the EXCEPT statement. The datasets that are being uploaded to the server are written to and read from at different intervals based on the schedules built into the program.
I was looking to find a more efficient way or improvements that I might be able to make use of in order to bring down the run times for this table. The program that manages the server is running on a remote server than the SQL instance which runs on part of the organizations SQL farm. I am not very experienced with SQL so I appreciate all the help I might be able to get. Below I added links to the code and an example statement produced by the system when going to run the comparison.
C# Code: https://pastebin.com/8PeUvekG
SQL Statement: https://pastebin.com/zc9kshJw
INSERT INTO vewCovid19_SP (Street_Number,Street_Dir,Street_Name,Street_Type,Apt,Municipality,County,x_cord,y_cord,address,Event_Number,latitude,longitude,Test_Type,Created_On_Date,msg) SELECT A.Street_Number,A.Street_Dir,A.Street_Name,A.Street_Type,A.Apt,A.Municipality,A.County,A.x_cord,A.y_cord,A.address,A.Event_Number,A.latitude,A.longitude,A.Test_Type,A.Created_On_Date,A.msg FROM #TEMP_UPLOAD A WHERE NOT EXISTS SELECT * FROM vewCovid19_SP B WHERE ISNULL(CONVERT(VARCHAR,A.Street_Number), 'NULL') = ISNULL(CONVERT(VARCHAR,B.Street_Number), 'NULL') AND ISNULL(CONVERT(VARCHAR,A.Street_Dir), 'NULL') = ISNULL(CONVERT(VARCHAR,B.Street_Dir), 'NULL') AND ISNULL(CONVERT(VARCHAR,A.Apt), 'NULL') = ISNULL(CONVERT(VARCHAR,B.Apt), 'NULL') AND ISNULL(CONVERT(VARCHAR,A.Street_Name), 'NULL') = ISNULL(CONVERT(VARCHAR,B.Street_Name), 'NULL') AND ISNULL(CONVERT(VARCHAR,A.Street_Type), 'NULL') = ISNULL(CONVERT(VARCHAR,B.Street_Type), 'NULL')); DROP TABLE #TEMP_UPLOAD"
Advertisement
Answer
One simple query form would be to load a new table from a UNION (which includes de-duplication).
eg
insert into viewCovid19_SP_new select * from vewCovid19_SP union select * from #temp_upload
then swap the tables with ALTER TABLE … SWITCH or drop and sp_rename.