I’m looking for the fastest way of inserting into Entity Framework.
I’m asking this because of the scenario where you have an active TransactionScope
and the insertion is huge (4000+). It can potentially last more than 10 minutes (default timeout of transactions), and this will lead to an incomplete transaction.
Advertisement
Answer
To your remark in the comments to your question:
“…SavingChanges (for each record)…”
That’s the worst thing you can do! Calling SaveChanges()
for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:
- Call
SaveChanges()
once after ALL records. - Call
SaveChanges()
after for example 100 records. - Call
SaveChanges()
after for example 100 records and dispose the context and create a new one. - Disable change detection
For bulk inserts I am working and experimenting with a pattern like this:
using (TransactionScope scope = new TransactionScope()) { MyDbContext context = null; try { context = new MyDbContext(); context.Configuration.AutoDetectChangesEnabled = false; int count = 0; foreach (var entityToInsert in someCollectionOfEntitiesToInsert) { ++count; context = AddToContext(context, entityToInsert, count, 100, true); } context.SaveChanges(); } finally { if (context != null) context.Dispose(); } scope.Complete(); } private MyDbContext AddToContext(MyDbContext context, Entity entity, int count, int commitCount, bool recreateContext) { context.Set<Entity>().Add(entity); if (count % commitCount == 0) { context.SaveChanges(); if (recreateContext) { context.Dispose(); context = new MyDbContext(); context.Configuration.AutoDetectChangesEnabled = false; } } return context; }
I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.
For the performance it is important to call SaveChanges()
after “many” records (“many” around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges
doesn’t do that, the entities are still attached to the context in state Unchanged
. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.
Here are a few measurements for my 560000 entities:
- commitCount = 1, recreateContext = false: many hours (That’s your current procedure)
- commitCount = 100, recreateContext = false: more than 20 minutes
- commitCount = 1000, recreateContext = false: 242 sec
- commitCount = 10000, recreateContext = false: 202 sec
- commitCount = 100000, recreateContext = false: 199 sec
- commitCount = 1000000, recreateContext = false: out of memory exception
- commitCount = 1, recreateContext = true: more than 10 minutes
- commitCount = 10, recreateContext = true: 241 sec
- commitCount = 100, recreateContext = true: 164 sec
- commitCount = 1000, recreateContext = true: 191 sec
The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. (“Many hours” is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.