How can I optimize my EF Code First query?

How can I optimize my EF Code First query? - entity-framework

[Updated - see update at bottom]
I am using EF code-first, and am generally happy with it. However, one simple (and common) operation is causing EF to generate ridiculously complex SQL, which is slowing my application down.
I am simply fetching a set of entities using a list of (integer) IDs, but because I need details of lots of sub-entities, I'm using .Include() to get those sub-entities loaded at the same time, as follows:
db.MyEntities
.Where(x => x.ClientId == clientId)
.Where(x => ids.Contains(x.Id))
.Where(x => x.SubEntity1 != null)
.Include(x => x.SubEntity1)
.Include(x => x.SubEntity1.SubSubEntity1)
.Include(x => x.SubEntity1.SubSubEntity2)
.Include(x => x.SubEntity1.SubSubEntity3)
.Include(x => x.SubEntity1.SubSubEntity4)
.Include(x => x.SubEntity2)
.Include(x => x.SubEntity2.SubSubEntity1)
.Include(x => x.SubEntity2.SubSubEntity2)
.Include(x => x.SubEntity2.SubSubEntity3)
.Include(x => x.SubEntity2.SubSubEntity4)
.Include(x => x.SubEntity3)
As you can see, it's not a particularly complex query, with the exception of all those Includes.
The SQL that EF generates for this is huge - around 74Kb of SQL. It doesn't take very long to execute (since normally the number of items in the list of IDs is small), but it takes EF more than a second just to construct the query - i.e. before the query is even sent to the database.
If I remove the Includes, then the query is much smaller, and the whole thing takes much less time - but the various related entities are then loaded one-at-a-time, which doesn't scale well.
EF seems to give me two options for loading the data:
Load all the sub-entities at once during the initial query (using Include as above), or
Load the sub-entities one-at-a-time (using lazy loading, or explicitly using Load/LoadProperty).
Option 1 would be my preferred option if it worked, but since that doesn't work in this case, my only remaining option is 2 - and I don't think that's acceptable: there would be too many database queries where the input list of IDs (i.e. the number of entities) is large.
It seems to me that there is another option that EF doesn't seem to address: having fetched the main entities, fetch all the relevant SubEntity1 entities, then all the relevant SubEntity2 entities, etc. That way, the number of queries is related to the number of types of entity to be fetched, rather the number of entities. This would scale much better.
I can't see a way to do that in EF: in other words, to say "load this property for all these entities (in a single query)".
Will I just have to give up on EF and write my own SQL?
UPDATE
I have noticed that even if I remove the Includes, the SQL generated is more complex than I think it should be, and I think this all stems from the fact that EF does not 'like' my table structure. I struggled for days to get EF to create the database structure I was looking for via Code First (and the Fluent API), and even when I had got to (nearly) where I wanted to be, I had to accept some compromises.
I think I'm now paying a further penalty for daring to do something that EF didn't want me to do. It looks like a simple query is more complex than it should be, and a slightly-more-complex query is massively more complex.
This is incredibly depressing - I thought I'd left all those EF hassles behind, and the system is now in production with dozens of users - which would make it very difficult for me to start over.
It seems I'm going to have to spend the eternity fighting EF tooth and nail at every turn. How I wish I'd never used it in the first place!
Anyway, back to my original question: if I have a bunch of entities of type A for which I want to load the related sub-entities of type B in one query, is there a way to do that?

How about loading the data using stored procedures? Yes, it is a bit dirty, but this is what I do when I hit performance issues with EF. I hope I'm not missing something in your question.
http://msdn.microsoft.com/en-US/data/jj691402

Related

How does EntityFramework Core mange data internally?

I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?

What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().

EF Core 1.0 - Include() generates more than one queries

I am using EF 7.0.0-rc1-final.
The following statement generates multiple queries on the server.
Is this normal or I am missing something ?
Group myGroup = dbContext_
.Set<Group>()
.Include(x => x.GroupRoles)
.ThenInclude(x => x.Role)
.FirstOrDefault(x => x.Name == "Approver");
I see two separate queries executed on the server:
And
It's a standard many-to-many scenario. Why is the first query ?
Thanks

Yes, it’s normal even in one to many scenarios.
EF7 generates multiple queries to avoid returning the same data multiple times.
Here is a great post about EF6 Include to understand why this change was required for EF7: Entity Framework pitfalls, include

Why does EF 4.1 not support complex queries as well as linq-to-sql?

I am in the process of converting our internal web application from Linq-To-Sql to EF CodeFirst from an existing database. I have been getting annoyed with Linq-To-Sql's limitations more and more lately, and having to update the edmx after updating a very intertwined database table finally frustrated me enough to switch to EF.
However, I am encountering several situations where using linq with Linq-To-Sql is more powerful than the latest Entity Framework, and I am wondering if anyone knows the reasoning for it? Most of this seems to deal with transformations. For example, the following query works in L2S but not in EF:
var client = (from c in _context.Clients
where c.id == id
select ClientViewModel.ConvertFromEntity(c)).First();
In L2S, this correctly retrieves a client from the database and converts it into a ClientViewModel type, but in EF this exceptions saying that Linq to Entities does not recognize the method (which makes sense as I wrote it.
In order to get this working in EF I have to move the select to after the First() call.
Another example is my query to retrieve a list of clients. In my query I transform it into an anonymous structure to be converted into JSON:
var clients = (from c in _context.Clients
orderby c.name ascending
select new
{
id = c.id,
name = c.name,
versionString = Utils.GetVersionString(c.ProdVersion),
versionName = c.ProdVersion.name,
date = c.prod_deploy_date.ToString()
})
.ToList();
Not only does my Utils.GetVersionString() method cause an unsupported method exception in EF, the c.prod_deploy_date.ToString() causes one too and it's a simple DateTime. Like previously, in order to fix it I had to do my select transformation after ToList().
Edit: Another case I just came across is that EF cannot handle where clauses that compare entities where as L2S has no issues for with it. For example the query
context.TfsWorkItemTags.Where(x => x.TfsWorkItem == TfsWorkItemEntity).ToList()
throws an exception and instead I have to do
context.TfsWorkItemTags.Where(x => x.TfsWorkItem.id == tfsWorkItemEntity.id).ToList()
Edit 2: I wanted to add another issue that I found. Apparently you can't use arrays in EF Linq queries, and this probably annoys me more than anything. So for example, right now I convert an entity that denotes a version into an int[4] and try to query on it. In Linq-to-Sql I used the following query:
return context.ReleaseVersions.Where(x => x.major_version == ver[0] && x.minor_version == ver[1]
&& x.build_version == ver[2] && x.revision_version == ver[3])
.Count() > 0;
This fails with the following exception:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities.
Edit 3: I found another instance of EF's bad Linq implementation. The following is a query that works in L2S but doesn't in EF 4.1:
DateTime curDate = DateTime.Now.Date;
var reqs = _context.TestRequests.Where(x => DateTime.Now > (curDate + x.scheduled_time.Value)).ToList();
This throws an ArgumentException with the message DbArithmeticExpression arguments must have a numeric common type.
Why does it seem like they downgraded the ability for Linq queries in EF than in L2S?

Edit (9/2/2012): Updated to reflect .NET 4.5 and added few more missing features
This is not the answer - it cannot be because the only qualified person who can answer your question is probably a product manager from ADO.NET team.
If you check feature set of old datasets then linq-to-sql and then EF you will find that critical features are removed in newer APIs because newer APIs are developed in much shorter times with big effort to deliver new fancy features.
Just list of some critical features available in DataSets but not available in later APIs:
Batch processing
Unique keys
Features available in Linq-to-Sql but not supported in EF (perhaps the list is not fully correct, I haven't used L2S for a long time):
Logging database activity
Lazy loaded properties
Left outer join (DefaultIfEmpty) since the first version (EF has it since EFv4)
Global eager loading definitions
AssociateWith - for example conditions for eager loaded data
Code first since the first version
IMultipleResults supporting stored procedures returning multiple result sets (EF has it in .NET 4.5 but there is no designer support for this feature)
Support for table valued functions (EF has this in .NET 4.5)
And some others
Now we can list features available in EF ObjectContext API (EFv4) and missing in DbContext API (EFv4.1):
Mapping stored procedures
Conditional mapping
Mapping database functions
Defining queries, QueryViews, Model defined functions
ESQL is not available unless you convert DbContext back to ObjectContext
Manipulating state of independent relationships is not possible unless you convert DbContext back to ObjectContext
Using MergeOption.OverwriteChanges and MergeOption.PreserveChanges is not possible unless you convert DbContext back to ObjectContext
And some others
My personal feeling about this is only big sadness. Core features are missing and features existing in previous APIs are removed because ADO.NET team obviously doesn't have enough resources to reimplement them - this makes migration path in many cases almost impossible. The whole situation is even worse because missing features or migration obstacles are not directly listed (I'm afraid even ADO.NET team doesn't know about them until somebody reports them).
Because of that I think that whole idea of DbContext API was management failure. At the moment ADO.NET team must maintain two APIs - DbContext is not mature to replace ObjectContext and it actually can't because it is just a wrapper and because of that ObjectContext cannot die. Resources available for EF development was most probably halved.
There are more problems related. Once we leave ADO.NET team and look on the problem from the perspective of MS product suite we will see so many discrepancies that I sometimes even wonder if there is any global strategy.
Simply live with the fact that EF's provider works in different way and queries which worked in Linq-to-sql don't have to work with EF.

A little late to the game, but I found this post while searching for something else, and figured that I'd post an answer to the fundamental questions in the original post, which mostly boil down to "LINQ to SQL allows Expression [x], but EF doesn't".
The answer is that the query provider (the code that translates your LINQ expression tree into something that actually executes and returns an enumerable set of stuff) is fundamentally different between L2S and EF. To understand why, you have to realize that another fundamental difference between L2S and EF is that L2S is table-based and EF is entity-model-based. In other words, EF works with conceptual entity models, and knows that the underlying physical model (the DB tables) don't necessarily reflect conceptual entities. This is because tables are normalized/denormalized, and have weird ways of dealing with entity type generalization (inheritance). So in EF, you have a picture of the conceptual model (which is the objects you code against in VB/C#, etc.) and a mapping to the physical underlying table(s) that comprise your conceptual entities. L2S does not do this. Every "model entity" in L2S is strictly a single table, with exactly the table fields as-is.
So far, that in and of itself doesn't really explain the problems in the original post, but hopefully, you can begin to appreciate that fundamentally, EF is not L2S+ or L2S v4.0. It's a very different kind of product (a real ORM) even though there is some coincidental overlap in the fact that both use LINQ to get at database data.
One other interesting difference is that EF was built from the ground up to be DB-agnostic, whereas L2S only works against MS SQL Server (although anyone who's sniffed around the L2S code deep enough will see that there are some underpinnings to allow different DBs, but in the end, it was tied to MS SQL Server only). This difference also plays a big role in why some expressions work in L2S LINQ, but not in EF LINQ. EF's query provider deals with canonical DB expressions, which in plain english means LINQ expressions that have SQL query equivalents in nearly all relational databases out there. The underlying EF engine (query provider) translates the LINQ expressions to these canonical DB expressions, then hands the canonical expression tree off to a specific DB provider (say Oracle's or MySQL's EF provider) where it is translated to product-specific SQL. You can see here how these canonical expressions are supposed to be translated by the individual product-specific providers: http://msdn.microsoft.com/en-us/library/ee789836.aspx
Additionally, EF allows some product-specific DB functions (store functions) as expressions through extensions. The underlying product-specific providers are responsible for both providing and translating these.
That being the case, EF only allows expressions that are DB canonical expressions, or store-specific functions, becuase all expressions in the tree are converted to SQL for execution against the DB.
The difference with L2S is that L2S passes off any expressions that it can to the DB from its limited SQL generator, and then executes any expressions it can't translate to SQL on the materialized object set that is returned. While this makes it look simpler to use L2S, what you don't see is that half your expressions don't actually make it to the DB as SQL and this can cause some really inefficient queries bringing back larger sets of data which are then iterated again in CLR memory with regular object LINQ acting against them for the other expressions which L2S can't turn into SQL.
You get the exact same effects in EF by using EF to return the materialized data to object sets in memory, and then using additional LINQ statements on that set in memory - just as L2S does, but in this case, you just have to do it explicitly, just like when you say you have to call .First() before using a non-DB-canonical expression. Similarly, you can call .ToArray() or .ToList() before using additional expressions that can't be turned into SQL.
One other big difference is that in EF, entities must be used whole. Real model entities represent conceptual objects that are transacted on in whole. You never have half a User, for example. The User is an object whose state depends on all fields. If you want to return partial entities, or a flattened join of multiple entities, you have to define a projection (what EF calls a Complex Type), or you can use some of the new 4.1/4.2/4.3 POCO features.

Now that Entity Framework is open source, it's easy enough to see, especially from the comments on Issues from the team, that one of the goals is to provide EF as an open layer on top of multiple databases. To be fair, Microsoft has unsurprisingly only implemented that layer on top of their SQL Server, but there are other implementations, like DevArt's MySql EF Connector.
As part of that goal, it's wise to keep the public interface somewhat limited, and attempting to add an additional layer that asks - well, some of this might be done in memory, some of this might be done in SQL, who knows - definitely complicates the job for other implementers trying to tie EF into this or that database.
So, I agree with the other answer here - you'd have to ask the team - but you can also get a lot of info about that team's direction on the public bug tracker and their other publications, and this seems like a clear motivation.
That said the main difference between LINQ to SQL and EF is the way EF throws an exception on code that has to be run in memory, and if you're an Expressions ninja there's nothing stopping you from going the next step to wrap the DbContext class and make that work just like LINQ to SQL. On the other hand, what you gain there is a mixed bag - you make it implicit rather than explicit when the SQL is generated, and when it fires, and that can be viewed as a loss of performance and control in exchange for flexibility/ease of authoring.

Finding Entity Framework contexts

Through various questions I have asked here and other forums, I have come to the conclusion that I have no idea what I'm doing when it comes to the generated entity context objects in Entity Framework.
As background, I have a ton of experience using LLBLGen Pro, and Entity Framework is about three weeks old to me.
Lets say I have a context called "myContext". There is a table/entity called Employee in my model, so I now have a myContext.Employees. I assume this to mean that this property represents the set of Employee entities in my context. However, I assume wrong, as I can add a new entity to the context with:
myContext.Employees.AddObject(new Employee());
and this new Employee entity appears nowhere in myContext.Employees. From what I gather, the only way to find this newly added entity is to track it down hiding in the myContext.ObjectStateManager. This sounds to me like the myContext.Employees set is in fact not the set of Employee entities in the context, but rather some kind of representation of the Employee entities that exist in the database.
To add further to this confusion, Lets say I am looking at a single Employee entity. There is a Project entity that has a M:1 relationship with Employee (an employee can have multiple projects). If I want to add a new project to a particular employee, I just do:
myEmployee.Projects.Add(new Project());
Great, this actually adds the Project to the collection as I would expect. But this flies right in the face of how the ObjectSet properties off of the context work. If I add a new Project to the context with:
myContext.Projects.AddObject(new Project());
this does not alter the Projects set.
I would appreciate it very much if someone were to explain this to me. Also, I really want a collection of all the Employees (or Projects) in the context, and I want it available as a property of the context. Is this possible with EF?

An ObjectSet is a query. Like everything in LINQ, it's lazy. It does nothing until you either enumerate it or call a method like .Count(), at which point a database query is run, and any returned entities are merged with those already in the context.
So you can do something like:
var activeEmployees = Context.Employees.Where(e => e.IsActive)
...without running a query.
You can further compose this:
var orderedEmployees = activeEmployees.OrderBy(e => e.Name);
...again, without running a query.
But if you look into the set:
var first = orderedEmployees.First();
...then a DB query is run. This is common to all LINQ.
If you want to enumerate entities already in the context, you need to look towards the ObjectStateManager, instead. So for Employees, you can do:
var states = EntityState.Added || EntityState.Deleted || // whatever you need
var emps = Context.ObjectStateManager.GetObjectStateEntries(states)
.Select(e => e.Entity)
.OfType<Employee>();
Note that although this works, it is not a way that I would recommend working. Typically, you do not want your ObjectContexts to be long-lived. For this, and other reasons, they are not really suitable to be a general-purpose container of objects. Use the usual List types for that. It is more accurate to think of an ObjectContext as a unit of work. Typically, in a unit of work you already know which instances you are working with.

Linq-To-Entities Include

I'm currently learning a bit more about Linq-To-Entities - particularly at the moment about eager and lazy loading.
proxy.User.Include("Role").First(u => u.UserId == userId)
This is supposed to load the User, along with any roles that user has. I have a problem, but I also have a question. It's just a simple model created to learn about L2E
I was under the impression that this was designed to make things strongly type - so why do I have to write "Role"? It seems that if I changed the name of the table, then this wouldn't create a compilation error...
My error is this:
The specified type member 'Roles' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported.
The solution below allows me to now write the code:
proxy.User.Include(u => u.Role).First(u => u.UserId == userId)
Which is MUCH nicer!

Include is a hint to eager load, it does not force eager loading.
Always check the IsLoaded property before referencing something that you hope was eager loaded by Include.
There are ways to put a strongly typed object in the include statement, but there is no solution available to this issue out of the box with Entity Framework. Google something like: Entity Framework ObjectQueryExtension Include

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse