Making tests faster by using only partial database in EntityFramework Effort - entity-framework

Use case: We have a quite large database (about 200 tables) that is used in a large (legacy) system. It's implemented as a database-first approach, with one edmx file defining the entire database. We are using XUnit and Effort for automatic testing. The problem is that these tests are very slow. It takes something like 7-8 minutes to run our current test suite, even though test coverage isn't anywhere near what we want it to be.
I've noticed that if I create a smaller subset of the edmx file, by removing some tables that aren't needed, tests run faster.
I'm looking for a solution where for a particular test, or suite of tests, we can somehow make Effort only create the subset of tables that are needed (I think in many cases, we'll only need one table).
Currently we're setting up our connection like this:
connection = EntityConnectionFactory.CreateTransient("metadata=res://entities.csdl|res://entities.ssdl|res://entities.msl");
Is there some way we can (for instance, by running an XML transformation in runtime), make Effort only create the data structures it need for a subset of tables that we define?

Disclaimer: I'm the owner of the project Entity Framework Effort
Our library has a feature that allows creating a restore point and rollbacking to it.
So by using this trick, you could use the CreateRestorePoint() only once when all tables are created and then for every test, start them with RollbackToRestorePoint. (There is several other ways to make it works but I guess you get the point)
It will without a doubt make your test run A LOT faster since the table will not have to be created every time.
Here is an example:
var conn = Effort.DbConnectionFactory.CreateTransient();
using (var context = new EntityContext(conn))
{
context.EntitySimples.Add(new EntitySimple { ColumnInt = 1 });
context.EntitySimples.Add(new EntitySimple { ColumnInt = 2 });
context.EntitySimples.Add(new EntitySimple { ColumnInt = 3 });
context.SaveChanges();
}
// Create a RestorePoint that will save all current entities in the "Database"
conn.CreateRestorePoint();
// Make any change
using (var context = new EntityContext(conn))
{
context.EntitySimples.RemoveRange(context.EntitySimples);
context.SaveChanges();
}
// Rollback to the restore point to make more tests
conn.RollbackToRestorePoint();

Separate out Unit test and Integration test. For Integration test you can use Database and run on higher environments (to save time) but on local environments you can make use of Faker\Bogus and NBuilder to generate massive data for unit test.
https://dzone.com/articles/using-faker-and-nbuilder-to-generate-massive-data
Other option is you can create resource file corresponding to your unit test cases
https://www.danylkoweb.com/Blog/the-fastest-way-to-mock-a-database-for-unit-testing-B6
I would also like to take you look at InMemoryDB vs SqlLite performance,
http://www.mukeshkumar.net/articles/efcore/unit-testing-with-inmemory-provider-and-sqlite-in-memory-database-in-ef-core
Although above example is for EFCore, in EF6 also we can use SqlLite
https://www.codeproject.com/Tips/1056400/Setting-up-SQLite-and-Entity-Framework-Code-First
So my recommendation for you is to go with sqllite for Integration testing scenarios. For Unit test you can go either with sqllite or with Faker\Bogus and NBuilder.
Hope it helps!

Related

Arquillian Persistence Extension - Long execution time, is it normal?

I'm writing some tests with arquillian for persistence layer in my app. I would like to use an Persistence Extension for database populating etc. The problem is that one test takes about ~15-25 seconds. Is it normal? Or am I doing something wrong? I've tried to run these tests on local postgres database (~10sec per test), remote postgres database (~15sec per test) and hsqldb at local container (~15sec per test).
Thanks in advance
P.S. When I'm not using "Persistence Extension" 12 tests takes about ~11sec (and that's acceptable), but I have to persist and delete entities from the code (hard to maintain and manage).
I am going to guess you are using APE (Arquillian Persistence Extension) v1.0.0a6. If this is the case what you are experiencing is the result of refactoring done between alpha5 and alpha6 which I filed the following ticket against: https://issues.jboss.org/browse/ARQ-1440
You could try using 1.0.0a5 which has some different issues that you might encounter and need to work around but it has 300% better performance then alpha6.

Issue with Entity Framework 4.2 Code First taking a long time to add rows to a database

I am currently using Entity Framework 4.2 with Code First. I currently have a Windows 2008 application server and a database server running on Amazon EC2. The application server has a Windows Service installed that runs once per day. The service executes the following code:
// returns between 2000-4000 records
var users = userRepository.GetSomeUsers();
// do some work
foreach (var user in users)
{
var userProcessed = new UserProcessed { User = user };
userProcessedRepository.Add(userProcessed);
}
// Calls SaveChanges() on DbContext
unitOfWork.Commit();
This code takes a few minutes to run. It also maxes out the CPU on the application server. I have tried the following measures:
Remove the unitOfWork.Commit() to see if it is network related when the application server talks to the database. This did not change the outcome.
Changed my application server from a medium instance to a high CPU instance on Amazon to see if it is resource related. This caused the server not to max out the CPU anymore and the execution time improved slightly. However, the execution time was still a few minutes.
As a test I modified the above code to run three times to see if execution time for the second and third loop using the same DbContext. Every consecutive loop took longer to run that the previous one but that could be related to using the same DbContext.
Am I missing something? Is it really possible that something as simple as this takes minutes to run? Even if I don't commit to the database after each loop? Is there a way to speed this up?
Entity Framework (as it stands) isn't really well suited to this kind of bulk operation. Are you able to use one of the bulk insert methods with EC2? Otherwise, you might find that hand-coding the T-SQL INSERT statements is significantly faster. If performance is important then that probably outweighs the benefits of using EF.
My guess is that your ObjectContext is accumulating a lot of entity instances. SaveChanges seems to have a phase that has time linear in the number of entities loaded. This is likely the reason for the fact that it is taking longer and longer.
A way to resolve this is to use multiple, smaller ObjectContexts to get rid of old entity instances.

Managing database changes

I'm starting to move more logic into the database, using triggers, views, functions, CTEs, etc. When plv8/json comes out for postgres, I can see myself putting lots of logic in there.
I'm having problems with the "standard" way of doing database migrations in sequel and activerecord. Both sequel and activerecord let you put arbitrary sql code into timestamped files. When each file is ran, a schema_versions table is updated with the filename (or timestamp in the filename), which keeps record of which migrations have been applied to the current database.
If a lot of coding is being done at the database level, that means that modifications to existing views, functions, etc follow the below pattern:
Migration 1 defines a function and a view that uses that function.
-- Migration 1
create function calculate(x int) returns int as $$
return x + 1;
$$ language sql;
create view foos as (
select something, calculate(something) from a_table
);
Requirements change, and I need to change a function type. In Migration 2 I have to drop all objects that depend on foo, and recreate them by copying their entire body -- even if there weren't any changes in most of the other code!
-- Migration 2
-- Have to drop all views and functions that depend on the
-- `calculate(int)` function.
drop view foos;
create or replace calculate(x bigint) returns bigint as $$
return x + 1;
$$ language sql;
-- I could do `drop function calculate(int) cascade`,
-- but I might accidentally drop some objects that wouldn't get recreated below.
-- Now I have to recreate foo.
create view foos as (
select something, calculate(something) from a_table
);
If I'm building a system based on views and functions and triggers, my migrations would be filled with duplicated code, and it's difficult to find the latest version of the code. You might say "don't do that!", but for my purposes (e-commerce, shipping, transactions), I'm finding it's a lot easier and faster to have the database ensure the integrity of the data by doing the logic inside the database.
You can (of course) dump the current database schema (which includes all the code definitions), but I think you lose comments. And you wouldn't generally want to edit a giant file that contains the whole schema.
Any ideas on how to solve this problem?
My best idea is to how the sql code contained in their own canonical files (app/sql/orders/shipping.sql, app/sql/orders/creation.sql, etc). Everyone develops directly on these. Whenever it's time for a release, then you'd want to make a new migration file, look at all the changed code since the previous release, figure out the dependency chain of the database objects that need to be dropped and recreated, and then copy the sql from the canonical sql files into a new sequel/activerecord migration file. But it's a pain. :/
Thoughts are very welcome. I hope I explained this well enough, I'm cutting back on my caffeine intake and I'm a little groggy atm.
Oh, I asked a similar question on Stack Overflow: Changing the type of a column used in other views The answer was a function that let me pass in:
sql code to run
database views to drop and recreate
The function would retrieve the view definition, drop the views, run the sql code, then recreate the view definition (in reverse order of dropping). Perhaps a system of functions like this would help solve the problem of having to copy/paste sql code into the migration files.
I'd recommend liquibase.
You create files which track the changes to your database and these will be run into the database in the correct migration order.
You might find Dave Wheeler's blog-posts interesting starting from here:
http://justatheory.com/computers/databases/simple-sql-change-management.html
My rate of database change is fairly small but I tend to be careless and make small changes to the schema directly, so I've had to come up with a fair bit of infrastructure to catch when I've done so. The basic elements are:
A makefile that can rebuild a development database from scratch
A set of schema-files separated into "modules" (lookups_schema.sql, lookup_data.sql)
A set of update files that transition from one revision to the next
I don't usually have the corresponding downgrade scripts, some people do
A script to populate my database with a plausible amount of test data
Crucially, a test suite via pgTAP that checks my various functions, views and also the upgrade scripts. The upgrade tests can be run against a live database too.
If you have a separate instance of PostgreSQL set up with fsync turned off / on ramdisk etc then rebuilding the whole DB and populating it can take seconds (if you don't have too much test data).
Start with #1, #2, then add #6 (pgTAP is very cool), then the rest. The crucial thing is a test suite that checks your in-database code.
There are tools that try to automate schema changes for you, but they are really only good at adding a new column to a table and that sort of thing. Once you have code in your db then they're not much help.

Dynamic test cases

We are using NUnit to run our integration tests. One of tests should always do the same, but take different input parameters. Unfortunately, we cannot use [TestCase] attribute, because our test cases are stored in an external storage. We have dynamic test cases which could be added, removed, or disabled (not removed) by our QA engineers. The QA people do not have ability to add [TestCase] attributes into our C# code. All they can do is to add them into the storage.
My goal is to read test cases from the storage into memory, run the test with all enabled test cases, report if a test case is failed. I cannot use "foreach" statement because if test case #1 is failed, then rest of the test cases will not be run at all. We already have build server (CruiseControl.net) where generated NUnit reports are shown, therefore I would like to continue using NUnit.
Could you point to a way how can I achieve my goal?
Thank you.
You can use [TestCaseSource("PropertyName")\] which specifies a property (or method etc) to load data from.
For example, I have a test case in Noda Time which uses all the BCL time zones - and that could change over time, of course (and is different on Mono), without me changing the code at all.
Just make your property/member load the test data into a collection, and you're away.
(I happen to have always used properties, but it sounds like it should work fine with methods too.)

How do I load fixtures correctly for a test suite using Test::DBIx::Class?

I have a bunch of tests for my DBIx::Class schema and I am using Test::DBIx::Class. This is great as it gives me useful test functions and loads fixtures. It also has a Test::mysqld trait so I can dynamically create a test mysqld instance, deploy the schema, load fixtures and test. But if I have a bunch of test scripts it seems silly to start the server, deploy and load fixtures at the start of each script when instantiating via the constructor.
What is the best way to create the test database and populate it for the duration of my tests?
At work one of the first tests we run loads all the fixtures the rest of the tests require. That's one way of managing it but your later comment also sounds sensible.
I have had further thoughts about this and come to the conclusion that I should split my fixtures up and only load the ones that are used by that test script. Makes sense so then the test scripts can be run independently or with prove's --shuffle without things blowing up!