Issue with Entity Framework 4.2 Code First taking a long time to add rows to a database - entity-framework

I am currently using Entity Framework 4.2 with Code First. I currently have a Windows 2008 application server and a database server running on Amazon EC2. The application server has a Windows Service installed that runs once per day. The service executes the following code:
// returns between 2000-4000 records
var users = userRepository.GetSomeUsers();
// do some work
foreach (var user in users)
{
var userProcessed = new UserProcessed { User = user };
userProcessedRepository.Add(userProcessed);
}
// Calls SaveChanges() on DbContext
unitOfWork.Commit();
This code takes a few minutes to run. It also maxes out the CPU on the application server. I have tried the following measures:
Remove the unitOfWork.Commit() to see if it is network related when the application server talks to the database. This did not change the outcome.
Changed my application server from a medium instance to a high CPU instance on Amazon to see if it is resource related. This caused the server not to max out the CPU anymore and the execution time improved slightly. However, the execution time was still a few minutes.
As a test I modified the above code to run three times to see if execution time for the second and third loop using the same DbContext. Every consecutive loop took longer to run that the previous one but that could be related to using the same DbContext.
Am I missing something? Is it really possible that something as simple as this takes minutes to run? Even if I don't commit to the database after each loop? Is there a way to speed this up?

Entity Framework (as it stands) isn't really well suited to this kind of bulk operation. Are you able to use one of the bulk insert methods with EC2? Otherwise, you might find that hand-coding the T-SQL INSERT statements is significantly faster. If performance is important then that probably outweighs the benefits of using EF.

My guess is that your ObjectContext is accumulating a lot of entity instances. SaveChanges seems to have a phase that has time linear in the number of entities loaded. This is likely the reason for the fact that it is taking longer and longer.
A way to resolve this is to use multiple, smaller ObjectContexts to get rid of old entity instances.

Related

Making tests faster by using only partial database in EntityFramework Effort

Use case: We have a quite large database (about 200 tables) that is used in a large (legacy) system. It's implemented as a database-first approach, with one edmx file defining the entire database. We are using XUnit and Effort for automatic testing. The problem is that these tests are very slow. It takes something like 7-8 minutes to run our current test suite, even though test coverage isn't anywhere near what we want it to be.
I've noticed that if I create a smaller subset of the edmx file, by removing some tables that aren't needed, tests run faster.
I'm looking for a solution where for a particular test, or suite of tests, we can somehow make Effort only create the subset of tables that are needed (I think in many cases, we'll only need one table).
Currently we're setting up our connection like this:
connection = EntityConnectionFactory.CreateTransient("metadata=res://entities.csdl|res://entities.ssdl|res://entities.msl");
Is there some way we can (for instance, by running an XML transformation in runtime), make Effort only create the data structures it need for a subset of tables that we define?
Disclaimer: I'm the owner of the project Entity Framework Effort
Our library has a feature that allows creating a restore point and rollbacking to it.
So by using this trick, you could use the CreateRestorePoint() only once when all tables are created and then for every test, start them with RollbackToRestorePoint. (There is several other ways to make it works but I guess you get the point)
It will without a doubt make your test run A LOT faster since the table will not have to be created every time.
Here is an example:
var conn = Effort.DbConnectionFactory.CreateTransient();
using (var context = new EntityContext(conn))
{
context.EntitySimples.Add(new EntitySimple { ColumnInt = 1 });
context.EntitySimples.Add(new EntitySimple { ColumnInt = 2 });
context.EntitySimples.Add(new EntitySimple { ColumnInt = 3 });
context.SaveChanges();
}
// Create a RestorePoint that will save all current entities in the "Database"
conn.CreateRestorePoint();
// Make any change
using (var context = new EntityContext(conn))
{
context.EntitySimples.RemoveRange(context.EntitySimples);
context.SaveChanges();
}
// Rollback to the restore point to make more tests
conn.RollbackToRestorePoint();
Separate out Unit test and Integration test. For Integration test you can use Database and run on higher environments (to save time) but on local environments you can make use of Faker\Bogus and NBuilder to generate massive data for unit test.
https://dzone.com/articles/using-faker-and-nbuilder-to-generate-massive-data
Other option is you can create resource file corresponding to your unit test cases
https://www.danylkoweb.com/Blog/the-fastest-way-to-mock-a-database-for-unit-testing-B6
I would also like to take you look at InMemoryDB vs SqlLite performance,
http://www.mukeshkumar.net/articles/efcore/unit-testing-with-inmemory-provider-and-sqlite-in-memory-database-in-ef-core
Although above example is for EFCore, in EF6 also we can use SqlLite
https://www.codeproject.com/Tips/1056400/Setting-up-SQLite-and-Entity-Framework-Code-First
So my recommendation for you is to go with sqllite for Integration testing scenarios. For Unit test you can go either with sqllite or with Faker\Bogus and NBuilder.
Hope it helps!

Entity Framework memory leak in Azure worker role

I investigate my worker role memory dumps using WinDbg.
I found the following results by grabbing dumps every half hour from WaWorkerHost.exe locally. First column is a count of objects, second is a size. Also, most expensive objects in the dump have string type.
35360 3394560 System.Data.Objects.EntitySqlQueryState
40256 3864576 System.Data.Objects.EntitySqlQueryState
45152 4334592 System.Data.Objects.EntitySqlQueryState
I found that class here http://entityframework.codeplex.com/SourceControl/latest#src/EntityFramework/Core/Objects/Internal/EntitySqlQueryState.cs
And you can see that it caches query string.
Is it possible that Entity Framework caches that objects without any releases ?
I found an article that NHibernate can.
http://rasmuskl.dk/2008/12/19/a-windbg-debugging-journey-nhibernate-memory-leak/
That worker role automatically restart every day on production server when RAM is over.
I have Entity Framework 5, Azure SDK 2.5.
Please, help me with that issue, what could you advice ?

Arquillian Persistence Extension - Long execution time, is it normal?

I'm writing some tests with arquillian for persistence layer in my app. I would like to use an Persistence Extension for database populating etc. The problem is that one test takes about ~15-25 seconds. Is it normal? Or am I doing something wrong? I've tried to run these tests on local postgres database (~10sec per test), remote postgres database (~15sec per test) and hsqldb at local container (~15sec per test).
Thanks in advance
P.S. When I'm not using "Persistence Extension" 12 tests takes about ~11sec (and that's acceptable), but I have to persist and delete entities from the code (hard to maintain and manage).
I am going to guess you are using APE (Arquillian Persistence Extension) v1.0.0a6. If this is the case what you are experiencing is the result of refactoring done between alpha5 and alpha6 which I filed the following ticket against: https://issues.jboss.org/browse/ARQ-1440
You could try using 1.0.0a5 which has some different issues that you might encounter and need to work around but it has 300% better performance then alpha6.

Entity Framework Code First - Model change breaks Seed

We've been using Entity Framework Code First 5 for a little while now, without major issue.
I've recently discovered that ANY change I make to my model (such as adding a field, or removing a field) means that the Seed method no longer runs leaving my database in an invalid state.
If I reverse the change, the seed method runs fine.
I have tried making changes to varying parts of my model, so it's not the specific change which is relevant.
Anyone know how I can (a) debug what the specific issue is, or (b) come across this themselves and know how to fix it?
UPDATE: After the model change, however many times I query the database it doesn't run the Seed. However, I have found that if I manually run IISRESET, and then re-execute the web service which executes the query it does then run the seed! Anyone know why this would be the case, and why suddenly I need to reset IIS in between the database initialization and the Seed executing?
Many thanks Steve

Cannot find a record just created in a different thread with JPA

I am using the Play! framework, and have a difficulty with in the following scenario.
I have a server process which has a 'read-only' transaction. This to prevent any possible database lock due to execution as it is a complicated procedure. There are one or two record to be stored, but I do that as a job, as I found doing them in the main thread could result in a deadlock under higher load.
However, in one occasion I need to create an object and subsequently use it.
However, when I create the object using a Job, wait for the resulting id (with a Promise return) and then search in the database for it, it cannot be found.
Is there an easy way to have the JPA search 'afresh' in the DB at this point? I implemented a 5 sec. pause to test, so I am sue it is not because the procedure hadn't finished yet.
Check if there is a transaction wrapped around your INSERT and if there is one check that the transaction is COMMITed.