I’m running a very simple function that reads lines from a text file in batches. Each line contains an sql query so the function grabs a specified number of queries, executes them against the SQL database, then grabs the next batch of queries until the entire file is read. The problem is that over time with very large files the process starts to slow considerably. I’m guessing there is a memory leak somewhere in the function but can’t determine where it may be. There is nothing else going on while this function is running. My programming skills are crude at best so please go easy on me. 🙂
for (int x = 0; x<= totalBatchesInt; x++) { var lines = System.IO.File.ReadLines(file).Skip(skipCount).Take(batchSize).ToArray(); string test = string.Join("n", lines); SqlCommand cmd = new SqlCommand(test.ToString()); try { var rowsEffected = qm.ExecuteNonQuery(CommandType.Text, cmd.CommandText, 6000, true); totalRowsEffected = totalRowsEffected + rowsEffected; globalRecordCounter = globalRecordCounter + rowsEffected; fileRecordCounter = fileRecordCounter + rowsEffected; skipCount = skipCount + batchSize; TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Starting, "Rows progress for " + folderName + "_" + fileName + " : " + fileRecordCounter.ToString() + " of " + linesCount + " records"); } catch (Exception esql) { TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Cancelling, "Error processing file " + folderName + "_" + fileName + " : " + esql.Message.ToString() + ". Aborting file read"); } }
Advertisement
Answer
There are many things wrong with your code:
You never dispose your command. That’s a native handle to an ODBC driver, waiting for the garbage collector to dispose it is very bad practice.
You shouldn’t be sending those commands individually anyway. Either send them all at once in one command, or use transactions to group them together.
This one is the reason why it’s getting slower over time:
File.ReadLines(file).Skip(skipCount).Take(batchSize)
will read the same file over and over and ignore a growing amount of lines every attempt, and so growing slower and slower as the number of lines ignored (but processed) gets larger and larger.
To fix #3, simply create the enumerator once and iterate it in batches. In pure C#, you can do something like:
using var enumerator = File.ReadLines(file).GetEnumerator(); for (int x = 0; x<= totalBatchesInt; x++) { var lines = new List<string>(); while(enumerator.MoveNext() && lines.Count < batchSize) list.Add(enumerator.Current); string test = string.Join("n", lines); // your code... }
Or if you’re using Morelinq (which I recommend), something like this:
foreach(var lines in File.ReadLines(file).Batch(batchSize)) { // your code... }