I have two related classes, Voter and Party. Party is in a one-to-one relationship with Voter, represented by properties Voter.PartyID and Voter.Party.
I set up an IQueryable like this:
IQueryable<Voter> voters = context.Voters.Include(v=>v.Party).Where(...)
that creates a subset of Voters. Later on I want to summarize the subset by Party. I do this like this:
var retVal = voters.Select( v => v.Party ).Distinct().GroupBy( p =>
p,
( p, b ) => new ...
Unfortunately, 'p' -- which should be the related property Party of each individual Voter in voters -- is always null.
I don't understand why the earlier Include statement isn't "forcing" the Voter.Party property to be retrieved/defined.
I've discovered that materializing the subset of Voters before doing the GroupBy seems to force the retrieval/definition of Voter.Party. Unfortunately, doing the materialization early enough to solve the problem (I've simplified things here, the actual query logic is more involved) creates some enormous datasets.
Thoughts on how to get Voter.Party to be retreived/defined?
Related
I have a query that joins about 10 tables some that are self referencing tables. I use an "IN" statement for the conditional on the ID column (indexed) of the top most table.
var aryOrderId = DetermineOrdersToGet(); //Logic to determine what orderids to get
var result = dbContext.Orders.Where(o=>aryOrderId.Contains(o.id)
.Include(o=>o.Customer)
.Include(o=>o.Items.Select(oi=>oi.ItemAttributes))
.Include(o=>o.Items.Select(oi=>oi.Dimensions))
.Include(o=>o.CustomOptions.Select(oc => oc.CustomOptions1))
.....A Bunch more.....
.ToList();
I would like to figure out a way to speed this up without redesigning my tables and flattening out the structure. Currently 50-200 records take 10-20 seconds.
This data can be read only. I don't need to update these records.
Can I convert this to a stored procedure?
How hard is this to do?
Will I be able to get noticeable performance gains?
One of the slower parts of the database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to select only the properties you actually plan to use.
For example, it seems that an Order has zero or more ItemAttributes. Evey ItemAttribute belongs to exactly one Order, using a foreign key OrderId.
If you fetch all Orders with Id in ArryOrderId, each order with its thousand ItemAttributes, you know that every ItemAttribute will have a foreign key OrderId with the same value as the Id of the Order that it belongs to. It is a waste to send 1000 times the same value.
When querying data using entity framework, always use Select. Select only the properties yo actually plan to use. Only use Include if you intend to change the fetched objects.
var result = dbContext.Orders
.Where(order=>aryOrderId.Contains(order.id)
.Select(order => new
{ // select only the properties you plan to use:
Id = order.Id,
...
Customer = order.Customer.Select(customer => new
{ // again: only the properties you plan to use
Id = order.Customer.Id,
Name = order.Customer.Name,
...
},
ItemAttributes = order.ItemAttributes.Select(itemAttribute => new
{
...
})
.ToList(),
Dimensions = order.Dimensions.Select(dimension => new
{
...
})
.ToList(),
....A Bunch more.....
})
.ToList();
If after selecting only the properties that you actually plan to use, the query still takes too long, think again: do I really need all these properties.
Another solution to limit the execution time is fetching the date 'per page', using Skip / Take. The danger is of course that when you are viewing page 10, the data of page 1 might be changed in a way that page 10 should be interpreted differently.
As jtate mentions, if you don't need everything from the joined tables, don't include them. Instead, utilize .Select() to retrieve just the data you want from the entity and it's associated relationships.
I.e.
var query = dbContext.Orders
.Where(x => aryOrderId.Contains(x => x.OrderId))
.Select(x => new
{
x.OrderId,
x.OrderNumber,
OrderItems = x.Items.Select(i => new
{
i.ItemId,
Attributes = i.Attributes.Select(a => a.AttributeName).ToList(),
Dimensions = i.Dimensions.Select(d => new {d.DimensionId, d.Name}).ToList(),
}).ToList(),
// ...
}).ToList();
You can structure the query, or queries however you like to find an optimal result.
Alternatively you can consider utilizing a view on the database and binding an entity to the view. This option works well for read-only views of data. Provided you include the relevant IDs you can always retrieve the applicable "real" entities at any time to load a details page or perform an action/update against the entity.
Answering your 3 questions. Yes, you can use a stored procedure and that's what I would do in this situation. It is not hard at all; EF makes it quite simple. You can either have it return a new complex type or you can map it to an entity. Since you're saying the data can be readonly, you probably are okay with a basic function import returning a complex type (EF's default behavior). Either way, you will have noticeable performance gains.
For db-first, see http://www.entityframeworktutorial.net/stored-procedure-in-entity-framework.aspx
Basically, you'll follow these steps.
Create the stored procedure on your database
Update the model from the database. When it asks which objects to include, you should be able to select your stored procedure.
Click Finish. EF will generate a complex type that has all the properties returned by your stored procedure, and it will add a signature to your context for executing the stored procedure, so it can be called like this: var results = myContext.myProcedure(param1, param2); There are screenshots of this at the link above.
You can also go in and modify the model to customize the details, such as the name of the complex type and the name of the function (by default the function will match the name of the SP and will return an ObjectResult<T> where T is your complex type, which will be the name of the procedure with "_Result" as a suffix).
I have an app using EF 6 and MVC 5 that works fine for inputting data, but now when I try to display some of it I'm having troubles. The basic layout of my entities can be seen in the following diagram:
The first part where I'm having trouble is in querying and filtering the data. I would like to return a list of premises and related data where a survey and signoff exist, but an approval does not. In straight SQL, the query that works now is:
SELECT *
FROM Premises p LEFT OUTER JOIN Approvals a ON a.Id = p.Id
JOIN Surveys s ON s.PremiseId = p.Id
JOIN SignOffs so ON so.Id = s.Id
WHERE a.ApprovedBy IS NULL
The code that I started with is like this:
var premises = Premises.Include(p => p.Approval)
.Include(p => p.Surveys)
.Include(p => p.Surveys.Select(s => s.SignOff));
This appears* to return all records including the child data, but when I try to filter it so I get only records that have a signoff record but do not have an approval, it doesn't work.
var premises = Premises.Include(p => p.Approval).Where(p => p.Approval.ApprovedBy == null)
.Include(p => p.Surveys)
.Include(p => p.Surveys.Select(s => s.SignOff).Where(s => s.Signature != null));
If I use this code, I get this error:
The Include path expression must refer to a navigation property defined on the type. Use dotted paths for reference navigation properties and the Select operator for collection navigation properties.
Parameter name: path
I've changed this query around a lot to try different things, so I'm not sure what all I have done, but I think the first Where statement might work by itself, but the second one definetly causes the error.
How do I need to structure my query to get it to return the requested data properly filtered?
Also, I put an asterisk above on stating that the query appears to return all the data and child data because I can't actually test it. When I'm trying to write my Razor CSHTML page for this, it's not giving me intellisense for the child and grandchild data, and if I enter what I think it should be I get errors. How do I need to reference this data on the page?
You cannot use Include() like this, it is only good for specifying to load a navigation property, not to specify to load an entity when a navigation property is something (not null, in your case).
To do the filtering, I suggest something like this:
var premises = Premises.Include(p => p.Approval).Include(p => p.Surveys).Include(p => p.Surveys.Select(s => s.SignOff))
.Where(p=>p.Approval.ApprovedBy!=null && p.Surveys.Any(s=>s.SignOff.Signature!=null));
So basically, the includes and the filtering have nothing to do with each other. With the includes, you only specify what to load, you can still use the filter on the original entity set.
You're confusing what the Include LINQ method does. It only tells EF to eagerly load that relationship, which is actually unnecessary if your query itself utilizes that relationship; EF will include the relationship by default in that case.
What it doesn't do is allow you filter those relationships. For example, in this portion of your code:
.Include(p => p.Surveys.Select(s => s.SignOff).Where(s => s.Signature != null));
The where clause is applied to Premises, not SignOff as you seem to think. In other words, Where filters the main table being queried, not the table you're including.
There's two paths forward here. You can simply filter Premises by the important parts, i.e.:
var premises = Premises.Where(p => p.Approval.ApprovedBy == null && p => p.Surveys.Any(s => s.SignOff.Signature != null));
That will return only premises where these conditions are true, but the included Surveys collection will contain all surveys related to each premise, not just the ones with null signoff signatures.
If you need to filter the related items as well, then you must explicitly load them:
foreach (premise in premises)
{
context.Entry(premise)
.Collection(p => p.Surveys)
.Query()
.Where(s => s.SignOff.Signature != null)
.Load();
}
Two things of note:
Because of the nature of how this query must be applied, there's no way to do it once for all premises. You'll have to iterate over the premises and explicitly load the Surveys collection for each.
Since this will issue a new query, you want to avoid loading the Surveys collection either lazily or eagerly before this explict load. Otherwise, you're querying the same information twice, which is very inefficient. The easiest way to ensure that is to remove the virtual keyword from the collection property. However, if you do that, then you will have to eager or explicitly load the collection or it will be null. For more information, see: https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113).aspx
I am working with Entity Framework and pretty new with it.
I have a table named: Order and table named: Products.
Each order have a lot of products.
When generating the entities I get Order object with ICollection to products.
The problem is I have a lot of products to each order (20K) and when I do
order.Products.where(......)
The EF runs a select statement only with orderId= 123 and does the rest of the where in the code.
Because I have a lot of results - the select takes a lot of time. How can I change the code - that the select in the DB will be with the where conditions?
This statement:
var prods = order.Products.Where(...);
is equivalent to:
var temps = order.Products;
var prods = temps.Where(...);
Unlike Where(...), which returns an IQueryable, order.Products triggers a lazy loading, which produces an ICollection and will be executed immediately, not delayed. So it's this order.Products part that generates the select statement you see. It fetches all the products belonging to that order into memory. Then the Where(...) part is executed in memory, hence the bad performance.
To avoid this, you should use order.Products only if you really want all the products on an order. If you want only a subset of them, do something like the following:
ctx.Products.Where(prod => prod.Order.Id == order.Id && ...)
Note that ctx is the database context, not the order object.
If you think that the prod.Order.Id == order.Id clause above looks a little dirty, here's a purer but longer alternative:
ctx.Entry(order).Collection(ord => ord.Products).Query().Where(...)
which produces exactly the same SQL query.
With lazy loading used by default, I know that you should call .Include() on your Entity Framework entities to pull in associated entities you want in your queries to reduce the number of calls to the db if you're calling LINQ methods on your entities. If you don't, you run the risk of repeated database calls for each row (the N+1 problem)
Can someone confirm that if I write a canonical LINQ query, with the joins defined explicitly, that we guard against N+1?
from x in _context.tblOrder
join y in _context.tblCustomer equals y.id = x.customerId
select x
Is there any way N+1 could creep in when we're loading in all the required entities with joins?
EDIT
As background, someone asked how junior developers could guard against N+1. I mentioned the simplest way would be to write out your queries and define your joins, I want confirmation that was I indicated was 100% accurate.
If what you are really asking is
Will this query hit the database once?
Then the answer is yes. LINQ to EF translates your expression to raw SQL and only when you evaluate the query will it send anything to the database e.g. ToList()/foreach/for etc. and once that query is sent nothing else is unless you explicitly tell it otherwise.
Your LINQ statement could be simplified using a Lambda expression e.g.
_context.tblOrder.Include("Customer").ToList();
This would give you all the order details, including all related customer details, in a single database trip.
Just because you specify tables in a join doesn't mean that you can't run into a n+1 issue when you iterate over the values. Consider the following extension to your query:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select o;
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.Employee.FirstName));
}
In this case, each time you navigate through the order's Employee object, the employee is fetched from the database for that order. If you wanted to avoid the issue, you could project the values you want in the select clause:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select new {o.OrderDate, o.Employee.FirstName};
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.FirstName));
}
Note, in this case, you don't even need the join as you can just use the navigation properties instead. Of course, if you don't allow navigation properties in your entities and rely only on joins, you can avoid the n+1 situation, but that is not a very OOP way of solving the problem.
I think you would be safe guaranteeing against n+1 if you only return anonymous types from your queries, but that would be rather restrictive as well.
The best option is to make sure to profile your application's generated SQL and know precisely when and why you are hitting the database. I discuss some of the profilers available at http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-Database-Performance-hints.
Greetings,
Considering the Northwind sample tables Customers, Orders, and OrderDetails I would like to eager load the related entities corresponding to the tables mentioned above and yet I need ot order the child entities on the database before fetching entities.
Basic case:
var someQueryable = from customer in northwindContext.Customers.Include("Orders.OrderDetails")
select customer;
but I also need to sort Orders and OrderDetails on the database side (before fetching those entities into memory) with respect to some random column on those tables. Is it possible without some projection, like it is in T-SQL? It doesn't matter whether the solution uses e-SQL or LINQ to Entities. I searched the web but I wasn't satisfied with the answers I found since they mainly involve projecting data to some anonymous type and then re-query that anonymous type to get the child entities in the order you like. Also using CreateSourceQuery() doesn't seem to be an option for me since I need to get the data as it is on the database side, with eager loading but just by ordering child entities. That is I want to do the "ORDER BY" before executing any query and then fetch the entities in the order I'd like. Thanks in advance for any guidance. As a personal note, please excuse the direct language since I am kinda pissed at Microsoft for releasing the EF in such an immature shape even compared to Linq to SQL (which they seem to be getting away slowly). I hope this EF thingie will get much better and without significant bugs in the release version of .NET FX 4.0.
Actually I have Tip that addresses exactly this issue.
Sorting of related entities is not 'supported', but using the projection approach Craig shows AND relying on something called 'Relationship Fixup' you can get something very similar working:
If you do this:
var projection = from c in ctx.Customers
select new {
Customer = c,
Orders = c.Orders.OrderByDescending(
o => o.OrderDate
)
};
foreach(var anon in projection )
{
anon.Orders //is sorted (because of the projection)
anon.Customer.Orders // is sorted too! because of relationship fixup
}
Which means if you do this:
var customers = projection.AsEnumerable().Select(x => x.Customer);
you will have customers that have sorted orders!
See the tip for more info.
Hope this helps
Alex
You are confusing two different problems. The first is how to materialize entities in the database, the second is how to retrieve an ordered list. The EntityCollection type is not an ordered list. In your example, customer.Orders is an EntityCollection.
On the other hand, if you want to get a list in a particular order, you can certainly do that; it just can't be in a property of type EntityCollection. For example:
from c in northwindContext.Customers
orderby c.SomeField
select new {
Name = c.Name,
Orders = from o in c.Orders
orderby c.SomeField
select new {
SomeField = c.SomeField
}
}
Note that there is no call to Include. Because I am projecting, it is unnecessary.
The Entity Framework may not work in the way you expect, coming from a LINQ to SQL background, but it does work. Be careful about condemning it before you understand it; deciding that it doesn't work will prevent you from learning how it does work.
Thank you both. I understand that I can use projection to achieve what I wanted but I thought there might be an easy way to do it since in T-SQL world it's perfectly possible with a few nested queries (or joins) and order bys. On the other hand seperation of concerns sounds reasonable and we are in the entity domain now so I will use the way you two both recommended though I have to admit this is easier and cleaner to achieve in LINQ to SQL by using AssociateWith.
Kind regards.