Entity Framework, performance tuning

Entity Framework, performance tuning - entity-framework

I have entity "Ideas", which has child entity collection "ChildIdeas". I need to load list of ideas and count of "ChildIdeas" (only count!).
I can do:
eager loading
from i in _dataContext.Ideas.Include("ChildIdeas") ...
advantages : all necessary data got by one request;
disadvantages : load unnecessary data. I need only count of ChildIdeas, not full ChildIdeas list
Explicit loading
from i in _dataContext.Ideas ...
idea.ChildIdeas.Loading()
advantages : none;
disadvantages : many requests (ideas.count + 1) instead of one, load unnecessary data
Independent requests
from i in _dataContext.Ideas ...
_repository.GetCountChildIdeas(idea.ID);
advantages : load only necessary data;
disadvantages : many requests (ideas.count + 1) instead of one
all 3 types have disadvantages. Maybe is exist any way to load only necessary data? If yes - what is it, if no - which way is the best for this case?
[ADDED]
after load testing (for 1 user) I got Page Load (in sec):
eager Child Ideas - 1.31 sec
explicit Child Ideas - 1.19 sec
external requests - 1.14 sec
so, eager way for my case is the worst... Why even explicit way is better?

You should use projection. Count of child ideas is not part of persisted Idea entity so create new non-mapped type containing all properties from Idea entity and Count property.
public class IdeaProjection
{
public int Id { get; set; }
// Other properties
public int Count { get; set; }
}
Now you can use simple projection query to get everything with single request without loading any additional data:
var query = from x in context.Ideas
where ...
select new IdeaProjection
{
Id = x.Id,
// Mapped other properties
Count = x.ChildIdeas.Count()
};
Disadvantage is that IdeaProjection is not entity and if you want to use it for updates as well you must transform it back to Idea and tell EF about changes. From performance perspective it is best you can get from EF without reverting back to SQL or stored procedures.

Related

Entity Framework 6 - Entity count correlation with memory use

I am doing some research into memory use on our Web Servers. We are hitting just shy of 1 gig of memory used when loading just our Sign In page. After a lot of debugging and removing as much as possible from the solution I can see an explosion of memory use on our very first query from the database.
We have around 180 Entities in our Db Context. We are using SQL Server as our storage mechanism and are using something like a repo pattern for reading from the database (we call it a Reader). The baseline of memory use just before querying the database for the first time is around 100 Mb
Some of the items I have tried include not using the reader and going after the DbContext without any additional extension methods which only delayed the initial explosion of memory. I also wondered if maybe extra memory was being used to map the DbContext like it does on the initial query and I generated a file and had it read from it instead of mapping and didn't notice any real drop in memory use. I also ensured that we are disposing after each read and only having 1 DbContext per unit of work (in this case loading the Sign In page)
Finally after quite a bit of effort I stripped the solution down to only the Entities needed to load the Signin Page (like 3-4 Entities) and then I saw a HUGE drop in memory use. Not quite as much as I wanted but only a couple hundred meg jump in memory use.
So the question is what is the correlation in Entity Count to the memory use of the DbContext? Perhaps we are doing something wrong with our entities.

By default, an EF DbContext will maintain a reference to every entity that it loads. This means that a DbContext that is left "alive" for a significant amount of time will continually consume memory as new entities are requested. This is one reason the general advice for EF Contexts is to keep them as short-lived as possible.
The number of entity definitions and relationships a context is configured to work with can have a bearing on EF performance including memory use, but the bulk of the memory usage will be from entities being loaded and tracked. Breaking up a DbContext into smaller contexts to track a subset of entities can be useful for larger systems to bump initialization and query performance. It also accommodates defining multiple entity definitions per table, such as detailed entities for things like updates and stripped down summary entities for searches and other basic operations.
The best tool to avoid excessive memory use in an application is to leverage projection wherever possible. This can be a bugbear when working with Repository patterns where the default behaviour is to return entities and specifically collections of entities.
For example, if I have a need to display a list of orders and I need to display the products in each order along with the quantity, as well as the customer name given a structure similar to:
Order (OrderId, Number, CreatedDate)
-> OrderStatus (OrderStatusId, Name)
-> Customer (CustomerId, Name)
-> OrderLines (OrderLineId, Quantity)
-> Product (ProductId, Name)
I might have a Repository that can return an IEnumerable<Order> and have dealt with a way to tell it what to eager load, maybe even included a complex Expression<Func<Order>> to pass in search criteria for a "New" order status that ends up executing the following against the DbContext:
var orders = context.Orders
.Include(x => x.OrderStatus)
.Include(x => x.Customer)
.Include(x => x.OrderLines)
.ThenInclude(x => x.Product)
.Where(x => x.OrderStatus.OrderStatusId == OrderStatuses.New)
.ToList();
return orders;
Then this data would be passed to a view where we display the Order Number, Customer Name, and list the Product Names along with the Quantity from the associated OrderLine. The view can be provided these entities and extract the necessary fields, however we have loaded and tracked all fields of the related entities into the DbContext and into memory.
The alternative is to project to a ViewModel, a POCO containing just the fields we need:
[Serializable]
public class OrderSummaryViewModel
{
public int OrderId { get; set; }
public string OrderNumber { get; set; }
public int CustomerId { get; set; }
public string CustomerName { get; set; }
public IEnumerable<OrderLineSummaryViewModel> OrderLines { get; set; } = new List<OrderLineSummaryViewModel>();
}
[Serializable]
public class OrderLineSummaryViewModel
{
public int OrderLineId { get; set; }
public int ProductId { get; set; }
public string ProductName { get; set; }
public int Quantity { get; set; }
}
now when we query the data, instead of returning IEnumerable<Order>, we return IQueryable<Order>. We can also do away with all complexity around passing what to eager load, filtering, sorting, and pagination. Consumers can refine the query as they see fit. The repository method becomes a simple:
IQueryable<Order> GetOrdersByStatus(params OrderStatuses[] orderStatuses)
{
var query = context.Orders
.Where(x => orderStatuses.Contains(x => x.OrderStatus.OrderStatusId);
return query;
}
Then the caller projects this result into the ViewModel using Select or Automapper's ProjectTo:
var orders = Repository.GetOrdersByStatus(OrderStatuses.New)
.Select(x => new OrderSummaryViewModel
{
OrderId = x.OrderId,
OrderNumber = x.Number,
CustomerId = x.Customer.CustomerId,
CustomerName = x.Customer.Name,
OrderLines = x.OrderLines
.Select(ol => new OrderLineSummaryViewModel
{
OrderLineId = ol.OrderLineId,
ProductLineId = ol.Product.ProductId,
ProductName = ol.Product.Name,
Quantity = ol.Quantity
}).ToList()
}).ToList();
The key differences here is that we now do not have the DbContext loading or tracking all of the associated orders, order lines, customer, products, etc. Instead it generates a query that will just pull back the fields projected into the view model. The memory footprint only grows by the amount of data we actually need rather than all of the entities and details we don't need.
Loading entities should be reserved for cases where we want to update data. For cases where we just want to read data, projection is very useful to reduce the memory footprint. One consideration I believe many teams have around loading and passing around entities /w longer lived DbContexts or the hassle of detaching and reattaching entities is to leverage caching and avoid DB Round-Trips. IMO this is a poor reason for loading and passing around entities as it results in more memory usage, leaves systems vulnerable to tampering, easily leads to performance issues /w lazy loading, and also can result in stale data overwrites since cached entities in the DbContext are not automatically refreshed from their data.

WebApi2 GET always says error has occurred

I've got a simple GET method in my Web API 2 project which queries my Microsof tSQL database via Entity Framework that always returns an error. If I step through it in the debugger the exception is NOT hit. It actually looks like it's cleanly leaving the method. I'm very confused.
[Route("ar")]
public IHttpActionResult GetAuditArs(int auditId)
{
using (var context = new LabSOREntities()) {
try {
var ars = from r in context.SMBWA_Audit_AR
where r.SMBWA_Audit_Id == auditId
select r;
var ret = Ok(ars.ToArray());
return ret;
} catch (Exception ex) {
return BadRequest($"Something went wrong: {ex.Message}");
}
}
}
There's one row in the database, and I see my ars.ToArray() says that's there's a single element in it. How can I debug this since it's left my method when it blows up?
If I just hit that endpoint via the browser I get:
<Error>
<Message>An error has occurred.</Message>
</Error>

The issue will be that you are returning entities from your API call. Behind the scenes WebAPI has to serialize the data being returned, as it does this it will hit any lazy-load reference properties and attempt to load them. Since you are instantiating a scoped DB Context within a using block, the entities will be orphaned from the context prior to the serialization so EF will be throwing exceptions that the DB Context is not available.
Option to verify the behaviour
- Eager-load all references in your "SMBWA_Audit_AR" class. This should eliminate the error and confirm the lazy load serialization issue.
var ars = context.SMBWA_Audit_AR
.Include(x => x.Reference1)
.Include(x => x.Reference2) // etc. where Reference1/2 are related entites to your audit record. If you have a lot of references, that is a lot of includes...
.Where(x => x.SMBWA_Audit_Id = auditId)
.ToArray();
To avoid issues like this, and the cost/time to eager-load everything I recommend using a POCO DTO/ViewModel to return the details about these audit records. Then you can .Select() the fields needed for the POCO. This avoids the lazy-load serialization issue, plus optimizes your queries from EF to return just the data that is needed, not the entire object graph.
for example:
If you need the Audit #, Name, and a Notes field to display in a list of audit summaries:
public class AuditSummary
{
public int AuditID {get; set;}
public string AuditorName {get; set;}
public string Notes {get; set;}
// You can add whatever fields are needed from the Audit or related entities... Including collections of other DTOs for related entites, or summary details like Counts etc..
}
var ars = context.SMBWA_Audit_AR
.Where(x => x.SMBWA_Audit_Id = auditId)
.Select(x => new AuditSummary
{
AuditId = x.AuditId,
AuditorName = x.AuditedBy.Name, //Example getting reference details..
Notes = x.Notes
}).ToArray();
Return models that reflect what the consumer will need. This avoids issues with EF and ensures your queries are efficient.
Scope the DbContext to the Request using an IoC Container (Unity/Autofac, etc.) This can seem like a viable option but it isn't recommended. While it will avoid the error, as the serializer iterates over your entity, your DbContext will be querying each individual dependency one at a time by ID. You can see this behaviour by running a profiler against the database while the application is running to detect lazy-load calls. It will work in the end, but it will be quite slow.
As a general rule, don't return entities from Web API or MVC controller methods to avoid errors and performance issues with serializers.

How to avoid N+1 queries with Spring Data REST Projections?

I have been prototyping my new application in Spring Data REST backed by Spring Data JPA & Hibernate, which has been a fantastic productivity booster for my team, but as the data model becomes more complex the performance is going down the tubes. Looking at the executed SQL, I see two separate but related problems:
When using a Projection with only a few properties to reduce the size of my payload, SDR is still loading the entire entity graph, with all the overhead that incurs. EDIT: filed DATAREST-1089
There seems to be no way to specify eager loading using JPA, as SDR auto-generates the repository methods so I can't add #EntityGraph to them. (and per DATAREST-905 below, even that doesn't work) EDIT: addressed in Cepr0's answer below, though this can only be applied to each finder method once. See DATAJPA-749
I have one key model that I use several different projections of depending on the context (list page, view page, autocomplete, related item page, etc), so implementing one custom ResourceProcessor doesn't seem like a solution.)
Has anyone found a way around these problems? Otherwise anyone with a non-trivial object graph will see performance deteriorate drastically as their model grows.
My research:
How do I avoid n+1 queries with Spring Data Rest? (from 2013)
https://jira.spring.io/browse/DATAJPA-466
(
Add support for lazy loading configuration via JPA 2.1 fetch-/loadgraph.)
https://jira.spring.io/browse/DATAREST-905 (
No way to avoid loading all child relations in spring-data-rest?) (2016, unanswered)

To fight with 1+N issue I use the following two approaches:
#EntityGraph
I use '#EntityGraph' annotation in Repository for findAll method. Just override it:
#Override
#EntityGraph(attributePaths = {"author", "publisher"})
Page<Book> findAll(Pageable pageable);
This approach is suitable for all "reading" methods of Repository.
Cache
I use cache to reduce the impact of 1+N issue for complex Projections.
Suppose we have Book entity to store the book data and Reading entity to store the information about the number of readings of a specific Book and its reader rating. To get this data we can make a Projection like this:
#Projection(name = "bookRating", types = Book.class)
public interface WithRatings {
String getTitle();
String getIsbn();
#Value("#{#readingRepo.getBookRatings(target)}")
Ratings getRatings();
}
Where readingRepo.getBookRatings is the method of ReadingRepository:
#RestResource(exported = false)
#Query("select avg(r.rating) as rating, count(r) as readings from Reading r where r.book = ?1")
Ratings getBookRatings(Book book);
It also return a projection that store "rating" info:
#JsonSerialize(as = Ratings.class)
public interface Ratings {
#JsonProperty("rating")
Float getRating();
#JsonProperty("readings")
Integer getReadings();
}
The request of /books?projection=bookRating will cause the invocation of readingRepo.getBookRatings for every Book which will lead to redundant N queries.
To reduce the impact of this we can use the cache:
Preparing the cache in the SpringBootApplication class:
#SpringBootApplication
#EnableCaching
public class Application {
//...
#Bean
public CacheManager cacheManager() {
Cache bookRatings = new ConcurrentMapCache("bookRatings");
SimpleCacheManager manager = new SimpleCacheManager();
manager.setCaches(Collections.singletonList(bookRatings));
return manager;
}
}
Then adding a corresponding annotation to readingRepo.getBookRatings method:
#Cacheable(value = "bookRatings", key = "#a0.id")
#RestResource(exported = false)
#Query("select avg(r.rating) as rating, count(r) as readings from Reading r where r.book = ?1")
Ratings getBookRatings(Book book);
And implementing the cache eviction when Book data is updated:
#RepositoryEventHandler(Reading.class)
public class ReadingEventHandler {
private final #NonNull CacheManager cacheManager;
#HandleAfterCreate
#HandleAfterSave
#HandleAfterDelete
public void evictCaches(Reading reading) {
Book book = reading.getBook();
cacheManager.getCache("bookRatings").evict(book.getId());
}
}
Now all subsequent requests of /books?projection=bookRating will get rating data from our cache and will not cause redundant requests to the database.
More info and working example is here.

Delete a child from an aggregate root

I have a common Repository with Add, Update, Delete.
We'll name it CustomerRepository.
I have a entity (POCO) named Customer, which is an aggregate root, with Addresses.
public class Customer
{
public Address Addresses { get; set; }
}
I am in a detached entity framework 5 scenario.
Now, let's say that after getting the customer, I choose to delete a client address.
I submit the Customer aggregate root to the repository, by the Update method.
How can I save the modifications made on the addresses ?
If the address id is 0, I can suppose that the address is new.
For the rest of the address, I can chose to attach all the addresses, and mark it as updated no matter what.
For deleted addresses I can see no workaround...
We could say this solution is incomplete and inefficient.
So how the updates of aggregate root childs should be done ?
Do I have to complete the CustomerRepository with methods like AddAddress, UpdateAddress, DeleteAddress ?
It seems like it would kind of break the pattern though...
Do I put a Persistence state on each POCO:
public enum PersistanceState
{
Unchanged,
New,
Updated,
Deleted
}
And then have only one method in my CustomerRepository, Save ?
In this case it seems that I am reinventing the Entity "Non-POCO" objects, and adding data access related attribute to a business object...

First, you should keep your repository with Add, Update, and Delete methods, although I personally prefer Add, indexer set, and Remove so that the repository looks like an in memory collection to the application code.
Secondly, the repository should be responsible for tracking persistence states. I don't even clutter up my domain objects with
object ID { get; }
like some people do. Instead, my repositories look like this:
public class ConcreteRepository : List<AggregateRootDataModel>, IAggregateRootRepository
The AggregateRootDataModel class is what I use to track the IDs of my in-memory objects as well as track any persistence information. In your case, I would put a property of
List<AddressDataModel> Addresses { get; }
on my CustomerDataModel class which would also hold the Customer domain object as well as the database ID for the customer. Then, when a customer is updated, I would have code like:
public class ConcreteRepository : List<AggregateRootDataModel>, IAggregateRootRepository
{
public Customer this[int index]
{
set
{
//Lookup the data model
AggregateRootDataModel model = (from AggregateRootDataModel dm in this
where dm.Customer == value
select dm).SingleOrDefault();
//Inside the setter for this property, run your comparison
//and mark addresses as needing to be added, updated, or deleted.
model.Customer = value;
SaveModel(model); //Run your EF code to save the model back to the database.
}
}
}
The main caveat with this approach is that your Domain Model must be a reference type and you shouldn't be overriding GetHashCode(). The main reason for this is that when you perform the lookup for the matching data model, the hash code can't be dependent upon the values of any changeable properties because it needs to remain the same even if the application code has modified the values of properties on the instance of the domain model. Using this approach, the application code becomes:
IAggregateRootRepository rep = new ConcreteRepository([arguments that load the repository from the db]);
Customer customer = rep[0]; //or however you choose to select your Customer.
customer.Addresses = newAddresses; //change the addresses
rep[0] = customer;

The easy way is using Self Tracking entities What is the purpose of self tracking entities? (I don't like it, because tracking is different responsability).
The hard way, you take the original collection and you compare :-/
Update relationships when saving changes of EF4 POCO objects
Other way may be, event tracking ?

How to prepare data for display on a silverlight chart using WCF RIA Services + Entity Framework

I've used WCF RIA services with Entity Framework to build a simple application which can display and updates data about school courses. This was done by following the Microsoft tutorials. Now I would like to have a chart which shows a count for how many courses are on a key stage.
Example:
Key Stage 3 - 20 courses
Key Stage 4 - 32 courses
Key Stage 5 - 12 courses
Displayed on any form of chart. I have no problem binding data to the chart in XAML. My problem is that I do not know how to correct way of getting the data into that format. The generated CRUD methods are basic.
I have a few thoughts about possible ways, but don't know which is correct, they are:
Create a View in SQL server and map this to a separate Entity in the Entity Data Model. Generating new CRUD methods for this automatically.
Customise the read method in the existing DomainService using .Select() .Distinct() etc. Don't know this syntax very well labda expressions/LINQ??? what is it? Any good quickstarts on it?
Create a new class to store only the data required and create a read method for it. Tried this but didn't know how to make it work without a matching entity in the entity model.
Something I am not aware of.
I'm very new to this and struggling with the concepts so if there are useful blogs or documentation I've missed feel free to point me towards them. But I'm unsure of the terminology to use in my searches at the moment.

One way to is to build a model class. A model is a class that represents the data you wish to display. For example i might have a table with 10 fields but i only need to display 2. Create a model with these two properties and return that from your data layer.
you can use entity framework to pump data into a new class like so
Model Class:
public class Kitteh
{
public string Name { get; set; }
public int Age { get; set; }
}
Entity Query:
public Iqueryable<Kitteh> getKittehz
{
var result = from x in Data.TblCats
select new Kitteh
{
Name = x.Name,
Age = x.Age
}
return result;
}
If you are interested in the best practices approach to building silverlight applications I would suggest you research the MVVM pattern.
http://www.silverlight.net/learn/videos/silverlight-4-videos/mvvm-introduction/
http://www.silverlight.net/learn/tutorials/silverlight-4/using-the-mvvm-pattern-in-silverlight-applications/

I am attempting a similar piece of work.
I will tell you the approach I am going to use and maybe that can help you.
I am going to create a class in the silverlight project to describe the chartItem: It will have 2 string properties : Key and Value.
Then create a collection object...In your case, this could be a class that has one property of type Dictionary<string,string> myCollection... or ObservableCollection<ChartItem> myCollection
The next step is to do a ForEach loop on the data coming back from the server and Add to your Collection.
myCollection.Add(new chartItem{ Key= "Key Stage 3", Value = "20 Courses" });
myCollection.Add(new chartItem{ Key= "Key Stage 4", Value = "60 Courses" });
myCollection.Add(new chartItem{ Key= "Key Stage 5", Value = "10 Courses" });
... more to follow if you are still looking for an answer

There is no easy way to include Views in Entity Framework as it does not allow any table/view to be included without "Key" (PrimaryKey) which will cause more efforts as you will have to map view manually in EDMX and then map keys etc.
Now we have found out an alternative approach,
Create View called ChartItems in your DB
Create LinqToSQL file ViewDB
Drag View ChartItems in ViewDB
Create ChartItem[] GetChartItems method in your RIA Domain Service Class as follow
public ChartItem[] GetChartItems(..parameters...){
ViewDB db = new ViewDB();
return db.ChartItems.Where(...query mapping...).ToArray();
}
RIA Domain Service Class can contain any arbitrary method that you can directly invoke from client with parameters. It is as simple as calling a web service. And you have to return an array because IQueryable may or may not work in some cases, but we prefer Array. You can try IQueryable but it may not work correctly against linq to SQL.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse