What are the steps and tools used to troubleshoot Entity Framework performance problems? - entity-framework

I'm looking at an existing system that has a significant performance problem when inserting data using LINQ to Entities. The relationships are complex.
I am familiar with common Entity Framework performance factors such as how often SaveChanges is called, etc. There are two problems with this, though. First, it's not the problem here (the performance problem occurs before SaveChanges even happens). Second, it's just guessing. It often works but it's not a process, it's a hope.
I'd like to know the best way to deal with these big problems the real way, where hope doesn't work. If this were ADO using SQL commands, the tool would be SQL Server Performance Profiler to see how many queries were happening, what they were, what their execution plans are, etc., and then adjusting the commands to mitigate those problems. In Entity Framework land, these tools aren't effective because there is not a 1-1 mapping between what you tell it and the commands it generates, and in this particular case it's not even SQL being slow.
Although I'd like the answer here to be a generic approach, here is the relevant code specific to my current problem:
foreach( var thing in objectsToImport ) {
var newEntityFrameworkManagedObject = new EntityFrameworkManagedObject { createdOn = DateTime.UtcNow };
newEntityFrameworkManagedObject.OtherExistingEntity = getExistingEntity( thing.OtherEntityMatchingInfo );
... /*Many more lines associating new object to a bunch of existing items selected before loop.*/
modelRepository.Insert( newEntityFrameworkManagedObject );
}
...
modelRepository.SaveChanges();
The performance hit is on modelRepository.Insert(). We're importing a CSV of several-hundred rows and each call to Insert takes a full minute. Notice that it never gets to SaveChanges. Even the first item to be inserted is slow.

Related

How to solve initial very slow EF Entity call which uses TPH and Complex Types?

I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms

Entity Framework code first - development strategies

Working on a brand new project from the ground up. That means the data model is in a constant flux, doubly so because things are, inevitably, not as well planned as they should be. Model classes are being created and changed fairly regularly.
The plan was to use the latest version of EF with all the neat code-first stuff in it. But we're constantly tripping over the limitations the framework has in terms of adding or updating tables. The initialization options seem to allow only the complete deletion and re-creation of the database, which isn't really ideal.
I've had a look at the migrations. But this seems a sledgehammer to crack a nut: we don't need to detail every single small change and update with a new migration scaffold.
Are there some better strategies to deal with this? For instance, I started writing some unit tests to pre-populate one of the contexts with some test data, but because this causes the whole Db to drop and re-create, it causes problems with all the other contexts. Or perhaps making use of a custom initialiser to seed the data for us? How can we easily exclude these in production code?
We're also wondering about perhaps abandoning code-first and going back to EDMX diagrams. At least that way changes result in updated SQL commands which can be run directly against the database.
Any suggestions gratefully received.
I think, imho, that:
as the database schema must at least match your model you should/must detail every single change, and code first migration allows that and trace the changes over time
code first migration also allows to migrate the database schema for you
code first migration also allows you to produce sql that allows you to migrate the schema
For these reasons code first is as good (if not better) as the edmx approach
Please take few minutes to implement http://msdn.microsoft.com/en-us/data/jj591621.aspx
One other point, always imho and in a perfect world, if you unit test the business of you model you should not need the DAL, use generic collection. Be aware of different comportement of linq to object vs linq to entities, for example concerning the case sensitivity.

What are the benefits of ORM lazy loading?

I'm researching data layer underpinnings for a new web-based reporting system and have spent a lot of time evaluating ORM's over the last few days. That said, I've never dealt with "lazy loading" before and am confused at why its the default setting for LINQ queries in the Entity Framework. It seems like it creates a lot of network traffic and unnecessarily tasks the database with additional queries that could otherwise be resolved with joins.
Can someone describe a scenario in which lazy loading would be beneficial?
Some meta:
The new system will be working against a database with hundreds of tables and many terabytes of data in a production environment with over 3,000 concurrent users on the system 24 hours a day. They will be retrieving large datasets continuously. Is it possible that an ORM just isn't the right solution for our needs, especially since the app will be web-based?
When we talk about lazy loading we are talking about Navigation Properties (how we follow foreign keys). What lazy loading will do for us is to populate the entity from a remote table as we attempt to access that entity. For example if we have a model like this
public class TestEntity
{
public int Id{get;set;}
public AnotherEntity RemoteEntity{get;set;}
}
And call the following
var something = WhateverContext.TestEntities.First().RemoteEntity;
We will get 2 database calls, one for WhateverContext.TestEntities.First() and one for loading the remote entity.
I'm a web guy, (and more specifically an MVC guy) and for web stuff I don't think there is ever a good reason for wanting to do this, One database call is always going to be quicker than two if we require the same set of data.
The situation where I think that lazy loading is actually worth considering is when you don't know when you do your first query if you will need the second entity at all. In my opinion this is much more relevant for windows applications where we have a user who is performing actions in real time (rather than stateless MVC where users are requesting whole pages at once). For example I think lazy loading shines when we have a list of data with a details link, then we don't load the details until the user decides they want to see them.
I don't feel this extends to paging, sorting and filtering, IMO there should be one specifically crafted database query per page of data you are displaying, which returns exactly the data set required to display that page.
In terms of your performance question, I feel that EF (or another ORM) can probably meet your needs here but you want to be careful with how you are retrieving large datasets due to the way EF tracks entities. Check out my EF performance tuning cheat sheet, and read up on DetectChanges and AsNoTracking if you do decide to use EF with large queries.
Most ORMs will give you the option, when you're building up your object selections, to say "don't be lazy, go ahead and join", so if you're worried about it from an efficiency perspective, don't be. You can make it work (usually).
There are 2 particular cases I know of where lazy loading helps:
Chaining commands
What if you want to create a basic select, but then you want to run it through a sort and a filter function that's based on user input. You can simply pass the ORM object in, and attach the sort and filtering functionality to it. Instead of evaluating it each time, it only evaluates when it's actually used.
Avoiding huge, deep, highly-relational queries
What if you just need the IDs of some related fields? If it loads lazily, you don't have to worry about it joining a whole bunch of data and tables that you don't need, potentially slowing down the query and overusing bandwidth. Of course, if you DID want everything else, then you'll need to be explicit, or you may run into a problem where it lazily runs a query for each detail record. Like I mentioned at the outset, that's easily overcome in any ORM worth using.
A simple case is a result set of N records which you do not want to bring to the client at once. The benefit is that you are able to lazily load only what is needed for the clients demands, such as sorting, filtering, etc... An example would be a paging view where one could page through records and sort them accordingly, thus the client only needs N amount at a given time.
When you perform the LINQ query it translates that to SQL commands on the server side to provide only what is needed in the given context. It boils down to offloading work to the database and minimizing what you need to send back to the client.
Some will argue that ORM based lazy loading is wrong however that starts to move to semantics fairly quick and should be more about approach to design versus what is right and wrong.

Are there drawbacks to going around Entity Framework?

I have some design requirements that are not supported by Entity Framework, but are easily met by a simple SQL Query.
Essentially I need to do an insert that sets an Identity value.
Are there drawbacks to making a sproc that does my insert and then having EF call that sproc?
Are there caching concerns I need to be worried about? (Because I will be updating data "behind EF's back".)
Are there concurrency issues?
Anything else I need to be worried about?
If you don't keep around your db context, i.e. you dispose it after every unit of work this should work just fine (this covers most web scenarios) - unless you concurrently operate on the same table - if that is the case you might want to use a lock to synchronize the SQL and the EF queries or catch OptimisticConcurrencyException thrown by EF.
If you do keep a context around on the other hand make sure that you refresh it with RefreshMode.StoreWins.
Also see "Saving Changes and Managing Concurrency"

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.