Long Running Query in Entity Framework with multiple table joins - entity-framework

I have a query that joins about 10 tables some that are self referencing tables. I use an "IN" statement for the conditional on the ID column (indexed) of the top most table.
var aryOrderId = DetermineOrdersToGet(); //Logic to determine what orderids to get
var result = dbContext.Orders.Where(o=>aryOrderId.Contains(o.id)
.Include(o=>o.Customer)
.Include(o=>o.Items.Select(oi=>oi.ItemAttributes))
.Include(o=>o.Items.Select(oi=>oi.Dimensions))
.Include(o=>o.CustomOptions.Select(oc => oc.CustomOptions1))
.....A Bunch more.....
.ToList();
I would like to figure out a way to speed this up without redesigning my tables and flattening out the structure. Currently 50-200 records take 10-20 seconds.
This data can be read only. I don't need to update these records.
Can I convert this to a stored procedure?
How hard is this to do?
Will I be able to get noticeable performance gains?

One of the slower parts of the database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to select only the properties you actually plan to use.
For example, it seems that an Order has zero or more ItemAttributes. Evey ItemAttribute belongs to exactly one Order, using a foreign key OrderId.
If you fetch all Orders with Id in ArryOrderId, each order with its thousand ItemAttributes, you know that every ItemAttribute will have a foreign key OrderId with the same value as the Id of the Order that it belongs to. It is a waste to send 1000 times the same value.
When querying data using entity framework, always use Select. Select only the properties yo actually plan to use. Only use Include if you intend to change the fetched objects.
var result = dbContext.Orders
.Where(order=>aryOrderId.Contains(order.id)
.Select(order => new
{ // select only the properties you plan to use:
Id = order.Id,
...
Customer = order.Customer.Select(customer => new
{ // again: only the properties you plan to use
Id = order.Customer.Id,
Name = order.Customer.Name,
...
},
ItemAttributes = order.ItemAttributes.Select(itemAttribute => new
{
...
})
.ToList(),
Dimensions = order.Dimensions.Select(dimension => new
{
...
})
.ToList(),
....A Bunch more.....
})
.ToList();
If after selecting only the properties that you actually plan to use, the query still takes too long, think again: do I really need all these properties.
Another solution to limit the execution time is fetching the date 'per page', using Skip / Take. The danger is of course that when you are viewing page 10, the data of page 1 might be changed in a way that page 10 should be interpreted differently.

As jtate mentions, if you don't need everything from the joined tables, don't include them. Instead, utilize .Select() to retrieve just the data you want from the entity and it's associated relationships.
I.e.
var query = dbContext.Orders
.Where(x => aryOrderId.Contains(x => x.OrderId))
.Select(x => new
{
x.OrderId,
x.OrderNumber,
OrderItems = x.Items.Select(i => new
{
i.ItemId,
Attributes = i.Attributes.Select(a => a.AttributeName).ToList(),
Dimensions = i.Dimensions.Select(d => new {d.DimensionId, d.Name}).ToList(),
}).ToList(),
// ...
}).ToList();
You can structure the query, or queries however you like to find an optimal result.
Alternatively you can consider utilizing a view on the database and binding an entity to the view. This option works well for read-only views of data. Provided you include the relevant IDs you can always retrieve the applicable "real" entities at any time to load a details page or perform an action/update against the entity.

Answering your 3 questions. Yes, you can use a stored procedure and that's what I would do in this situation. It is not hard at all; EF makes it quite simple. You can either have it return a new complex type or you can map it to an entity. Since you're saying the data can be readonly, you probably are okay with a basic function import returning a complex type (EF's default behavior). Either way, you will have noticeable performance gains.
For db-first, see http://www.entityframeworktutorial.net/stored-procedure-in-entity-framework.aspx
Basically, you'll follow these steps.
Create the stored procedure on your database
Update the model from the database. When it asks which objects to include, you should be able to select your stored procedure.
Click Finish. EF will generate a complex type that has all the properties returned by your stored procedure, and it will add a signature to your context for executing the stored procedure, so it can be called like this: var results = myContext.myProcedure(param1, param2); There are screenshots of this at the link above.
You can also go in and modify the model to customize the details, such as the name of the complex type and the name of the function (by default the function will match the name of the SP and will return an ObjectResult<T> where T is your complex type, which will be the name of the procedure with "_Result" as a suffix).

Related

ExecuteStoreCommand "Delete" returning different record count to "DeleteObject" post delete, why?

Got a strange one here. I am using EF 6 over SQL Server 2012 and C#.
If I delete a record, using DeleteObject, I get:
//order.orderitem count = 11
db.OrderItem.DeleteObject(orderitem);
db.SaveChanges();
var order = db.order.First(r => r.Id == order.id);
//Order.OrderItem count = 10, CORRECT
If I delete an Order Item, using ExecuteStoreCmd inline DML, I get:
//order.orderitem count = 11
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
var order = db.Order.First(r => r.Id == order.id);
//order.orderitem count = 11, INCORRECT, should be 10
So the ExecuteStoreCommand version reports 11, however the OrderItem is definitely deleted from the DB, so it should report 10. Also I would have thought First() does an Eager search thus repopulating the "order.orderitem" collection.
Any ideas why this is happening? Thanks.
EDIT: I am using ObjectContext
EDIT2: This is the closest working solution I have using "detach". Interestingly the "detach" actually takes about 2 secs ! Not sure what it is doing, but it works.
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
db.detach(orderitem);
It would be quicker to requery and repopulate the dataset. How can I force a requery? I thought the following would do it:
var order = db.order.First(r => r.Id == order.id);
EDIT3: This seems to work to force a refresh post delete, but still take about 2 secs:
db.Refresh(RefreshMode.StoreWins,Order.OrderItem);
I am still not really understanding why one cannot just requery as a Order.First(r=>r.id==id) type query oftens take much less than 2 secs.
This would likely be because the Order and it's order items are already known to the context when you perform the ExecuteStoredCommand. EF doesn't know that the command relates to any cached copy of Order, so the command will be sent to the database, but not update any loaded entity state. WHere-as the first one would look for any loaded OrderItem, and when told to remove it from the DbSet, it would look for any loaded entities that reference that order item.
If you don't want to ensure the entity(ies) are loaded prior to deleting, then you will need to check if any are loaded and refresh or detach their associated references.
If orderitem represents an entity should just be able to use:
db.OrderItems.Remove(orderitem);
If the order is loaded, the order item should be removed automatically. If the order isn't loaded, no loss, it will be loaded from the database when requested later on and load the set of order items from the DB.
However, if you want to use the SQL execute approach, detaching any local instance should remove it from the local cache.
db.ExecuteStoreCommand("DELETE FROM ORDERITEM WHERE ID ={0}", orderitem.Id);
var existingOrderItem = db.OrderItems.Local.SingleOrDefault(x => x.Id == orderItem.Id);
if(existingOrderItem != null)
db.Entity(existingOrderItem).State = EntityState.Detached;
I don't believe you will need to check for the orderItem's Order to refresh anything beyond this, but I'm not 100% sure on that. Generally though when it comes to modifying data state I opt to load the applicable top-level entity and remove it's child.
So if I had a command to remove an order item from an order:
public void RemoveOrderItem(int orderId, int orderItemId)
{
using (var context = new MyDbContext())
{
// TODO: Validate that the current user session has access to this order ID
var order = context.Orders.Include(x => x.OrderItems).Single(x => x.OrderId == orderId);
var orderItem = order.OrderItems.SingleOrDefault(x => x.OrderItemId == orderItemId);
if (orderItem != null)
order.OrderItems.Remove(orderItem);
context.SaveChanges();
}
}
The key points to this approach.
While it does mean loading the data state again for the operation, this load is by ID so it's fast.
We can/should validate that the data requested is applicable for the user. Any command for an order they should not access should be logged and the session ended.
We know we will be dealing with the current data state, not basing decisions on values/data from the point in time that data was first read.

Entity Framework 6 : lazy loading doesn't work

The property is defined as virtual. But before I access the order property, the order entity's data has been loaded, why?
Full source code:
A couple of things:
When you state: doesn't work, you are getting an Order back, but the $ figure is 0.0 when you expect a different value? You appear have 2x order records, but based on what is coming back you're expecting a non-zero figure, are both records non-zero? In your debug view, expand the "Orders" in the pop-up context menu, this will reveal what order details EF has loaded.
Firstly you should be careful around the use of "OrDefault" renditions of the methods. Your code is assuming that a value is returned. In these cases you're better off using Single() or First() as applicable.
Additionally, when using First you should be specifying an OrderBy clause to ensure that there is a reliable ordering.
SaveChanges Should only be called if you modify data.
Lastly, lazy loading is an enabler for loading infrequently used data on demand. You should largely work to avoid relying on lazy-load calls. If you need the entire entities and you know you are going to use orders, then eager-load them.
I.e.
using (var context = new EfContext())
{
var customer = context.Customers
.Include(c => c.Orders)
.Single(c => c.CustomerId = customerId);
// Do stuff...
}
If you want just 1 applicable order for a given employee then consider using Select to retrieve it:
I.e.
using (var context = new EfContext())
{
var data = context.Customers.Where(c => c.CustomerId = customerId)
.Select(c => new { Customer = c, FirstOrder = c.Orders.OrderBy(o => o.OrderDate).FirstOrDefault()})
.Single();
// Do Stuff...
}
This will give you an anonymous type containing the Customer (without eager loading the orders) and the matching first order.
Better is just to use Select to retrieve the specific fields you need from customer and order(s). This reduces the amount of data (rows and columns) pulled back from the database.
sorry,It looks like the problem of visual studio,When the call statement is commented,the order table's sql querie no longer being executed,but when the call statement is included,although the breakpoint is set,the order table's sql querie still be executed.
visual studio debug execute the Expression automatically
comment the staement
sql profiler
include the statement
sql profiler

Retrieving some columns from database with Breeze.js, and still be able to update database

I am new to Breeze.js, but really enjoy it so far. I ran into an issue with updating a database with Breeze.js, when selecting only portion of columns of a model.
When I ran this statement:
$scope.emFac.entityQuery.from('Company');
the company entity matches my EF entity, retrieves all columns, creates entityAspect, and all is working fine when updating database:
However, when I retrieve only portion of corresponding Model's columns, Breeze.js returns anonymous object with specified properties (retrieving data works, but not updating does not), without the entityAspect, which is being used for tracking changes.
Here is the code with select statement:
$scope.emFac.entityQuery.from('Company').select('companyId, displayName');
Is there a way to retrieve only some columns of EF Model columns, and still track changes with Breeze.js, needed for database updates?
As you've discovered, Breeze treats the incoming data as plain objects instead of entities when you use select.
Your choices are:
On the server, Create a CustomerLite or similar object, and have a server endpoint that returns those without the need for select; OR
On the client, get the results from the query and create entities from each object, with status Unchanged
Example of #2:
var entities = [];
em.executeQuery(customerProjectionQuery).then(queryResult => {
queryResult.results.forEach(obj => {
// obj contains values to initialize entity
var entity = em.createEntity(Customer.prototype.entityType, obj, EntityState.Unchanged);
entities.push(entity);
});
})
Either way, you will need to ensure that your saveChanges endpoint on the server can handle saving the truncated Customer objects without wiping out the other fields.

FK Object in Entity Framework - performamce issue

I am working with Entity Framework and pretty new with it.
I have a table named: Order and table named: Products.
Each order have a lot of products.
When generating the entities I get Order object with ICollection to products.
The problem is I have a lot of products to each order (20K) and when I do
order.Products.where(......)
The EF runs a select statement only with orderId= 123 and does the rest of the where in the code.
Because I have a lot of results - the select takes a lot of time. How can I change the code - that the select in the DB will be with the where conditions?
This statement:
var prods = order.Products.Where(...);
is equivalent to:
var temps = order.Products;
var prods = temps.Where(...);
Unlike Where(...), which returns an IQueryable, order.Products triggers a lazy loading, which produces an ICollection and will be executed immediately, not delayed. So it's this order.Products part that generates the select statement you see. It fetches all the products belonging to that order into memory. Then the Where(...) part is executed in memory, hence the bad performance.
To avoid this, you should use order.Products only if you really want all the products on an order. If you want only a subset of them, do something like the following:
ctx.Products.Where(prod => prod.Order.Id == order.Id && ...)
Note that ctx is the database context, not the order object.
If you think that the prod.Order.Id == order.Id clause above looks a little dirty, here's a purer but longer alternative:
ctx.Entry(order).Collection(ord => ord.Products).Query().Where(...)
which produces exactly the same SQL query.

DDD and Entity Framework: Pulling children from aggregate root

This one is driving me nuts!
For simplicity, I'm not gonna put up the entire code that is used for our DDD but simply expose what I've tried and explain what isn't working.
I have a simple database structure:
Products (holds product data)
Orders (holds entered orders)
OrderProducts (ref table between Orders and Product)
I have an Order aggregate root and I want to pull out the product count of one simple Order.
I fetch my Order by id which results in an EF lambda:
var order = _orderRepository.Get(orderId);
Then, I try to pull the count of products in that order using:
var count = order.OrderProducts.Count();
This line chokes up when an order has A LOT a records because, it's fetching all of them. Fine.
So, I'm refining it a bit by adding some filters to the products I want to count from within my order.
A product has a couple of properties which include a type (so, there's a type ID).
So, now I'm trying this:
//This is trimming down my results to about a dozen products)
var count = order.OrderProduct
.Where(op=>op.Product.TypeId == 2)
.Count();
If I use Linqpad to see what kind of SQL is generated, to my surprise, it's still loading ALL the OrderProducts from this order!
How can I force it to apply the filter in the query directly?
It's loading all of them because once you touch the navigation property (i.e. order.OrderProducts) eager loading kicks in and loads all of them (i.e. even the ones that you don't want). Your only option to reduce that would be to query the database itself given the orderID. Maybe something like:
_orderProductRepository.Where(p => p.OrderID == orderId && p.Product.TypeID == 2);