Entity Framework: multiple dbcontext or not? And a few other performance related question - entity-framework-core

I’m build a calendar/entry/statistics application using quite complex models with a large number of relationships between models.
In general I’m concerned of performance, and are considering different strategies and are looking for input before implementing the application.
I’m completely new to DBContextPooling so please excuse me for possible stupid questions. But has DBContextPooling anything to do with the use of multiple DBContext classes, or is the use related to improved performance regardless of a single or multiple DBContext?
I might end up implementing a larger number of DBsets, or should I avoid it? I’m considering to create multiple DBContext Classes for simplicity, but will this reduce memory use and improve performance? Would it be better/smarter to split the application into smaller projects?
Is there any performance difference in using IEnumerable vs ICollection? I’m avoiding the use of lists as much as possible. Or would it be even better to use IAsyncEnumerable?

Most of your performance pain points will come from a complex architecture that you have not simplified. Managing a monolithic application result in lots of unnecessary 'compensating' logic when one use case is treading on the toes of another and your relationships are so intertwined.
Optimisations such as whether to use Context Pooling or IEnumerable vs ICollection can come later. They should not affect the architecture of your solution.
If your project is as complex as you suggest, then I'd recommend you read up on Domain Driven Design and Microservices and break your application up into several projects (or groups of projects).
https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/
Each project (or group of projects) will have its own DbContext to administer the entities within that project.
Further, each DbContext should start off by only exposing Aggregate Roots through the DbSets. This can mean more database activity than is strictly necessary for a particular use case but best to start with a clean architecture and squeeze that last ounce of performance (sometimes at the cost of architectural clarity), if and when needed.
For example if you want to add an attendee to an appointment, it can be appealing to attack the Attendee table directly. But, to keep things clean, and considering an attendee cannot exist without an appointment, then you should make appointment the aggregate root and only expose appointment as an entry point for the outside world to attack. That appointment can be retrieved from the database with its attendees. Then ask the appointment to add the attendees. Then save the appointment graph by calling SaveChanges on the DbContext.
In summary, your Appointment is responsible for the functionality within its graph. You should ask Appointment to add an Attendee to the list instead of adding an Attendee to the Appointment's list of attendees. A subtle shift in thinking that can reduce complexity of your solution an awful lot.
The art is deciding where those boundaries between microservices/contexts should lie. There can be pros and cons to two different architectures, with no clear winner
To your other questions:
DbContext Pooling is about maintaining a pool of ready-to-go instantiated DbContexts. It saves the overhead of repeated DbContext instantiation. Probably not worth it, unless you have an awful lot of separate requests coming in and your profiling shows that this is a pain point.
The number of DbSets required is alluded to above.
As for IEnumerable or ICollection or IList, it depends on what functionality you require. Here's a nice simple summary ... https://medium.com/developers-arena/ienumerable-vs-icollection-vs-ilist-vs-iqueryable-in-c-2101351453db
Would it be better/smarter to split the application into smaller
projects?
Yes, absolutely! Start with architectural clarity and then tweak for performance benefits where and when required. Don't start with performance as the goal (unless you're building a millisecond sensitive solution).

Related

Best Data Structure for an Entity-Component-System Framework

I've been reading a lot about Entity Frameworks and now I want to implement it on my game. An Entity Framework is based on making the game entities simple containers of Components, where a Component contains a certain characteristic of an Entity (and all the variables/accessors which describe this characteristic).
The game logic is then modularized by creating Systems. Each System implements and runs a certain aspect of the game logic (eg. Collisions, Rendering, Animation). Each System has to be able to access every Entity which has some certain combination of Components (eg. RenderSystem has to get only Entities which have PositionComponent and AnimationComponent).
My question regards the best data structure for achieving such functionality.
My current idea is to create a Vector (with N cells, where N is the number of possible components) of List of Entity. So whenever I create (instantiate and add certain Components) an Entity, I would also reference this Entity from each List for each Component it contains. "Killing" an Entity would require removing each reference from each List. The problem would be querying which entities have to be processed by a certain System, because the search-key would be a combination of Components, and not a single Component, adding overhead to the operation (many searches and comparisons would have to be done).
Is my idea good? Is there any better data structure I can use? Note that everything in the game is supposed to be an Entity, summing up to thousands of Entites on a single Level (I could possibly use some space partitioning).
They are two ways of doing it,
The purely data oriented system would lead you not to have an Entity class but just components sharing an ID. In this case, a vector or a hashmap for every system wouldn't be a problem as the search in these data structure is fast. If you want several components per system per entity you can aggregate your components in one data structure adapted for each system.
The problem is that a pure data oriented system can be less usable than a more pragmatic approach where you keep all the features of the previously described system but you keep an entity class that holds reference to his components (or aggregated components structures) of every system. Processing an entity (deleting or inspecting it) becomes much easier as you still have a place where all the information about what the entity is, i.e. what it is made of and not what state it is in, can be found in one place instead of querying every system.
In your case, the best thing is to try... It's quite easy and fast to implement a rough engine in the two ways, and once you've played with the two you'll be able to decide which one suites you better.
This article is valuable as far as it suggests 4 iterations for the data structure, but no one is a good solution in my opinion. But I recommend to read it, because there is a detailed analysis of the problem, nice estimations in terms of memory and such other good material.

ORM Entities vs. Domain Entities under Entity Framework 6.0

I stumbled upon the following two articles First and Second in which the author states in summary that ORM Entities and Domain Entities shouldn't be mixed up.
I face exactly this problem at the moment as I code with EF 6.0 using the Code First approach. I use the POCO classes as entities in the EF as well as my domain/business objects. But I find myself frequently in the situation where I define a property as public or a navigation property as virtual only because the EF Framework forces me to do so.
I don't know what to take as the bottom line of the two articles? Should I really create for example a CustomerEF class for the entity framework and a CustomerD for my domain. Then create a repository which consumes CustomerD maps it to CustomerEF do some queries and than maps back the received CustomerEF to CustomerD. I thought EF is all about mapping my domain entities to the data.
So please give me some advice. Do I overlook an important thing the EF is able to provide me with? Or is this a problem which can not completely solved by the EF? In the latter case what is a good way to manage this problem?
I agree with the general idea of these posts. An ORM class model is part of a data access layer first and foremost (even if it consists of so-called POCOs). If any conflict of interests arises between persistence and business logic (or any other concern), decisions should always be made in favor of persistence.
However, as software developers we always have to balance between purism and pragmatism. Whether or not to use the persistence model as a domain model depends on a number of factors:
The size/coherence of the development team. When the whole team knows that properties can be public just because of ORM requirements, but should not be set all over the place, it may not be a big deal. If everybody knows (and obeys) that an ID property is not to be used in business logic, having IDs may not be a big deal. A scattered, unexperienced or undisciplined team may need more stringent segregation of code.
The overlap between business logic concerns and persistence concerns. Object oriented design thrives when a class model sticks to SOLID principles. But these principles are not necessarily at odds with persistence concerns. I mean that although the concerns are different, in the end their resultant requirements may be quite similar. For instance, both concerns may require valid object state and correct associations.
There can be use cases, however, in which objects temporarily need to be in a state that absolutely shouldn't be stored. This may be a reason to work with dedicated domain classes. Another reason may be that the entity model just can't fulfill the best segmentation of responsibilities. For instance, a business process "blacklisting customer" may require data that is scattered over so many entity objects that new domain classes must be designed that can encapsulate the data and the methods working on them. In other words: doing this by entities would violate the Tell Don't Ask principle.
The need for layering. For instance, if the data access layer targets different database vendors it may have to consist of interchangeable parts that are vendor-specific (e.g. to account for subtle differences in data types between Oracle and Sql Server or to exploit vendor-specific features). Using the persistence model as domain model would probably bleed vendor-specific implementations into the business logic. That would be really bad. There the data access layer should be precisely that, a layer.
(Very trivial) The amount of data. Creating objects takes time and resources. When "many" objects are involved in a business case it may just be too expensive to build both entity objects and domain objects.
And more, undoubtedly.
So I would always try to be a pragmatist. If entity classes do a decent job, go for it. If the mismatch is too large, create a business domain for appropriate parts of the business logic. I would not slavishly follow a (any) design pattern just because it is a good pattern. Contrary to what is said in the post, it requires a lot of maintenance to map an entity model onto a business model. When you find yourself creating myriads of business classes that are almost identical to entity classes it's time to rethink what you're doing.

Entity Framework, Projections, and Relations (oh my?)

So, I'm working on new software, but I have no choice but to brownfield the database. I would like to use Entity Framework where it makes sense.
Here's my dilemma:
Since the tables are very wide, and I can't change this, I will probably make heavy use of projection to limit the width of the datasets that I query.
I do want to make use of navigation properties where it makes sense
From what I've seen, a lot of people use a model where there is a single DbContext class for the whole project.
So,
I'm weighing these pro's and cons, and I'm wondering what the established best practices might be:
Use 1 DbContext.
There could be A LOT of "pollution" here, with bunches of projections of the data inside of the 1 context class. This sounds like it could become a maintenance nightmare.
Don't make my projections dbsets at all -- just make them plain old objects and select new MyProject {..} into them.
This offers the benefit of keeping my projections in module-specific assemblies and namespaces, but now I get NO navigation/lazy loading/ etc.
Be evil?? and use multiple DbContexts?
I'm not really sure what the maintenance story looks like here, but I'm kind of starting to lean in this direction. My biggest problem with it is that it feels like I'm swimming against the current -- not many people seem to do this, but for a large system, it seems like it could be the best option.
Thoughts?
I think you must use POCO or DTO for data transfer between different layers of application. Use ViewModel to send data to View.
Consider using Repository Pattern and UoW to have a better and efficient architecture in this scenario. Limit the use of navigation properties till repository otherwise they makes entity heavy while transferring those across layers (Use POCO or DTO).
If you are doing as above, then I do not think using multiple DbContexts would give you any benefit. Thanks.

What are the benefits of ORM lazy loading?

I'm researching data layer underpinnings for a new web-based reporting system and have spent a lot of time evaluating ORM's over the last few days. That said, I've never dealt with "lazy loading" before and am confused at why its the default setting for LINQ queries in the Entity Framework. It seems like it creates a lot of network traffic and unnecessarily tasks the database with additional queries that could otherwise be resolved with joins.
Can someone describe a scenario in which lazy loading would be beneficial?
Some meta:
The new system will be working against a database with hundreds of tables and many terabytes of data in a production environment with over 3,000 concurrent users on the system 24 hours a day. They will be retrieving large datasets continuously. Is it possible that an ORM just isn't the right solution for our needs, especially since the app will be web-based?
When we talk about lazy loading we are talking about Navigation Properties (how we follow foreign keys). What lazy loading will do for us is to populate the entity from a remote table as we attempt to access that entity. For example if we have a model like this
public class TestEntity
{
public int Id{get;set;}
public AnotherEntity RemoteEntity{get;set;}
}
And call the following
var something = WhateverContext.TestEntities.First().RemoteEntity;
We will get 2 database calls, one for WhateverContext.TestEntities.First() and one for loading the remote entity.
I'm a web guy, (and more specifically an MVC guy) and for web stuff I don't think there is ever a good reason for wanting to do this, One database call is always going to be quicker than two if we require the same set of data.
The situation where I think that lazy loading is actually worth considering is when you don't know when you do your first query if you will need the second entity at all. In my opinion this is much more relevant for windows applications where we have a user who is performing actions in real time (rather than stateless MVC where users are requesting whole pages at once). For example I think lazy loading shines when we have a list of data with a details link, then we don't load the details until the user decides they want to see them.
I don't feel this extends to paging, sorting and filtering, IMO there should be one specifically crafted database query per page of data you are displaying, which returns exactly the data set required to display that page.
In terms of your performance question, I feel that EF (or another ORM) can probably meet your needs here but you want to be careful with how you are retrieving large datasets due to the way EF tracks entities. Check out my EF performance tuning cheat sheet, and read up on DetectChanges and AsNoTracking if you do decide to use EF with large queries.
Most ORMs will give you the option, when you're building up your object selections, to say "don't be lazy, go ahead and join", so if you're worried about it from an efficiency perspective, don't be. You can make it work (usually).
There are 2 particular cases I know of where lazy loading helps:
Chaining commands
What if you want to create a basic select, but then you want to run it through a sort and a filter function that's based on user input. You can simply pass the ORM object in, and attach the sort and filtering functionality to it. Instead of evaluating it each time, it only evaluates when it's actually used.
Avoiding huge, deep, highly-relational queries
What if you just need the IDs of some related fields? If it loads lazily, you don't have to worry about it joining a whole bunch of data and tables that you don't need, potentially slowing down the query and overusing bandwidth. Of course, if you DID want everything else, then you'll need to be explicit, or you may run into a problem where it lazily runs a query for each detail record. Like I mentioned at the outset, that's easily overcome in any ORM worth using.
A simple case is a result set of N records which you do not want to bring to the client at once. The benefit is that you are able to lazily load only what is needed for the clients demands, such as sorting, filtering, etc... An example would be a paging view where one could page through records and sort them accordingly, thus the client only needs N amount at a given time.
When you perform the LINQ query it translates that to SQL commands on the server side to provide only what is needed in the given context. It boils down to offloading work to the database and minimizing what you need to send back to the client.
Some will argue that ORM based lazy loading is wrong however that starts to move to semantics fairly quick and should be more about approach to design versus what is right and wrong.

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.