Entity Framework, repository and Linq? Load just an entity or also child nodes?

Entity Framework, repository and Linq? Load just an entity or also child nodes? - entity-framework

I am new to Entity Framework and I want to get some points about the constellation EF, LINQ, POCOs and repositories.
We have a working solution with a repository which uses EF and POCOs to access the database. We are doing all our queries in LINQ through the context. We added the mapping into mapping classes which are loaded at the application start as the database/tables are already existing.
If I have a business case where I need to calculate for a specific company the amount of toys bought by the employees for their children.
How would I build up the repository / repositories?
A: call with one repository all employees of a company and then call in the service layer again another repo for every employee the children and so on?
B: call one repository which returns me the company with all employees, children and the toys?
A seems to me much cleaner and I can reuse the repositories more often. But B seams to be the more efficient but not reusable so much. Less repositories and the queries would get bigger and bigger.
That is just a small example... but we have much larger business cases. What is the best architectural approach in this case?
class Company
{
List<Employee> employees;
}
class Employee
{
List<Child> children;
}
class Child
{
List<Toy> toys;
}

You don't need to call repository to get company, employee, children and toys!
I need to calculate for a specific Company the amount of Toys bought by the employees for there children
So your business case is to have a single number or maybe number per toy or number per employee. You don't need to load all those entities and compute it in your application. You just need to write an aggregation query (either in Linq or SQL). This whole computation is supposed to run in the database.
If you need to hide the query behind the repository simply choose one where this business case belongs and expose the query as a new method for the repository.

While there is never a hard and fast rule, to me using a single unit of work for each of your aggregate roots (In your example, Company) seems to work the most consistently to keep things organized, and prevent concurrency errors since it will handle all the wiring up, and managing of your objects. There is a great MSDN post on what a unit of work is. An excerpt from that article:
In a way, you can think of the Unit of Work as a place to dump all
transaction-handling code. The responsibilities of the Unit of Work
are to:
Manage transactions.
Order the database inserts, deletes, and updates.
Prevent duplicate updates. Inside a single usage of a Unit of Work object, different parts of the code may mark the same Invoice
object as changed, but the Unit of Work class will only issue a
single UPDATE command to the database.
The value of using a Unit of Work pattern is to free the rest of your
code from these concerns so that you can otherwise concentrate on
business logic.
There are several blog posts about this, but the best one I've found is on how to implement it is here. There are some other ones which have been referred to from this site here, and here.
A seems to me much cleaner and I can reuse the Repositories more
often. But B seams to be the more efficient but not reusable so much.
less repositories and the queries would get bigger and bigger.
Generic repositories handle your concerns here, making it easy to create a repo for each of your data objects, while making it reusable and easily testable. Then your business logic can just be handled in a service layer in your unit of work ensuring that you don't have concurrency issues.

Related

Entity Framework: multiple dbcontext or not? And a few other performance related question

I’m build a calendar/entry/statistics application using quite complex models with a large number of relationships between models.
In general I’m concerned of performance, and are considering different strategies and are looking for input before implementing the application.
I’m completely new to DBContextPooling so please excuse me for possible stupid questions. But has DBContextPooling anything to do with the use of multiple DBContext classes, or is the use related to improved performance regardless of a single or multiple DBContext?
I might end up implementing a larger number of DBsets, or should I avoid it? I’m considering to create multiple DBContext Classes for simplicity, but will this reduce memory use and improve performance? Would it be better/smarter to split the application into smaller projects?
Is there any performance difference in using IEnumerable vs ICollection? I’m avoiding the use of lists as much as possible. Or would it be even better to use IAsyncEnumerable?

Most of your performance pain points will come from a complex architecture that you have not simplified. Managing a monolithic application result in lots of unnecessary 'compensating' logic when one use case is treading on the toes of another and your relationships are so intertwined.
Optimisations such as whether to use Context Pooling or IEnumerable vs ICollection can come later. They should not affect the architecture of your solution.
If your project is as complex as you suggest, then I'd recommend you read up on Domain Driven Design and Microservices and break your application up into several projects (or groups of projects).
https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/
Each project (or group of projects) will have its own DbContext to administer the entities within that project.
Further, each DbContext should start off by only exposing Aggregate Roots through the DbSets. This can mean more database activity than is strictly necessary for a particular use case but best to start with a clean architecture and squeeze that last ounce of performance (sometimes at the cost of architectural clarity), if and when needed.
For example if you want to add an attendee to an appointment, it can be appealing to attack the Attendee table directly. But, to keep things clean, and considering an attendee cannot exist without an appointment, then you should make appointment the aggregate root and only expose appointment as an entry point for the outside world to attack. That appointment can be retrieved from the database with its attendees. Then ask the appointment to add the attendees. Then save the appointment graph by calling SaveChanges on the DbContext.
In summary, your Appointment is responsible for the functionality within its graph. You should ask Appointment to add an Attendee to the list instead of adding an Attendee to the Appointment's list of attendees. A subtle shift in thinking that can reduce complexity of your solution an awful lot.
The art is deciding where those boundaries between microservices/contexts should lie. There can be pros and cons to two different architectures, with no clear winner
To your other questions:
DbContext Pooling is about maintaining a pool of ready-to-go instantiated DbContexts. It saves the overhead of repeated DbContext instantiation. Probably not worth it, unless you have an awful lot of separate requests coming in and your profiling shows that this is a pain point.
The number of DbSets required is alluded to above.
As for IEnumerable or ICollection or IList, it depends on what functionality you require. Here's a nice simple summary ... https://medium.com/developers-arena/ienumerable-vs-icollection-vs-ilist-vs-iqueryable-in-c-2101351453db
Would it be better/smarter to split the application into smaller
projects?
Yes, absolutely! Start with architectural clarity and then tweak for performance benefits where and when required. Don't start with performance as the goal (unless you're building a millisecond sensitive solution).

Entity Framework - What is Service layer's role in repository pattern

Currently, I'm working on an ASP MVC4 project using EF5 with repository pattern.
I have just joined this project.
In this project, we have implemented many repository class, these repository will responsible for search, update, delete etc using the dbcontext, they also return the DTO classes and in the service layer we use those repositories to get the DTO then convert to the view model.
Every time I want to do some logic with the entities, I will go to the repositories and write code here. So I wonder why we need the service layer and the repositories at the same time, we can write the logic code directly in the service layer or use repositories in the controller directly.
I don't see any advantage here since our source code is too complicated and we need so many classes (DTO, viewmodel...) and I think the performance will be not good compare to using repositories or services directly.
Can you point out the key here? Thank you.

It's very simple:
Repositories are for data access
Services are for business logic
But once you've started to instill business concepts into repositories it becomes very hard to turn the tide.
An example of how easily business concepts mingle with data access concerns is soft deletes. Let's say that there is a table journal_voucher from which rows should never be deleted, only inactivated. So there is a boolean (bit) field IsActive that's set to false if a row should be off the record.
Now it seems obvious to have a Delete method in JournalRepository that sets the IsActive flag in stead of deleting an entity. Likewise, any retrieval methods may automatically filter out inactive records.
Wrong. Being active or inactive is a business concept. For a data access layer the content of any database field is meaningless. It's only supposed to read and write it properly.
Now see what happens: other entities will probably just be hard-deleted. Maybe yet others can't ever be deleted, or, why not, never be created. If one repository has this active/inactive responsibility, the next obvious step is to implement these other CRUD rules in the appropriate repositories as well. Then a business requirement emerges that only records of the current year are interesting... Oh, and we have to check whether a journal_voucher can even be inactivated... And so on and so forth.
You end up with a host of very different repository classes and scattered business logic.
I believe that if you decide to use your own repositories on top of Entity Framework's repositories (DbSets) they should be generic repositories. That is: for each entity class they do exactly the same thing. It's even arguable whether they should return DTOs instead of EF entity objects (I'd vote for the latter).
Everything else is done in services. So there will probably be a JournalService that inactivates journal_vouchers, with proper checking. The service decides that IsActive is set to false and instructs the repository to update the entity. (In fact a unit of work should do that, but that's a different story).
This distinction has many benefits:
The rest of the world only communicates with services.
Therefore, repositories can safely return IQueryable. The services limit the amount of retrieved data.
It's much easier to decide where business logic involving multiple entities belongs (i.e. almost all business logic).
It is much easier with dependency injection.
The repositories can be mocked relatively easily and the services can be readily unit tested without duplicating business rules in mocked repositories.

ASP.NET MVC - I think I am going about this wrong

Or I don't understand this at all.
I have started my ASP.NET MVC application using the Controller --> ViewModel --> Service --> Repository pattern.
Does every type of object (Customer, Product, Category, Invoice, etc..) need to have it's own repository and service? If so, how do you bring common items together?
I mean there are a lot of the times when a few of these things will be displayed on the same page. So I am not getting this I don't think.
So I was thinking I need a ShopController, which has a ShopViewModel, which could have categories, sub categoires, products, etc. But the problem, for me, is that it just does not seem to mesh well.
Maybe ASP.NET WebForms were for people like me :)
Edit
So would an aggregate consist of say:
Category, SubCategory, Product, ChildProduct, ProductReview with the Product being the aggregate root?
Then in the ViewModels, you would access the Product to get at it's child products, reviews, etc.
I am using entity framework 4, so how would you implement lazy loading using the repository/service pattern?

Does every type of object (Customer,
Product, Category, Invoice, etc..)
need to have it's own repository
You should have a repository per aggregate root in your domain. See this question for more information on what is an aggregate root.
In the example you give I could see a CustomerReposiotry which would handle retrieve all pertinent customer data(Customer has orders a order has a customer). A ProductRepository that handles retrieving product information.
and service? If so, how do you bring
common items together?
A service layer is nice but only if there is added value in adding this layer. If your service simply passes straight into the repository it might not be needed. However if you need to perform certain business logic on a Product a ProductService might make sense.
This might not make sense
public void UpdateProduct(Product product)
{
_repo.Update(product);
}
But if you have logic this layer makes sense to encapsulate your business rules for products.
public void UpdateProduct(Product productToUpdate)
{
//Perform some sort of business on the productToUpdate, raise domain events, ....
_repo.Update(productToUpdate);
}
So I was thinking I need a
ShopController, which has a
ShopViewModel, which could have
categories, sub categoires, products,
etc. But the problem, for me, is that
it just does not seem to mesh well.
If the domain is flushed out the view model ends up making sense
public ActionResult Index()
{
ShopViewModel shopViewModel = new ShopViewModel();
shopViewModel.Products = _productRepo.GetAll();
//other stuff on the view model.
return(shopViewModel);
}
Update
What happens when you also need to
provide data unobtainable from an
aggregate root? For example, say I
have a create Customer view and in
that view, I also need to provide the
user with a collection of Companies to
choose from to associate a new
customer with. Does the collection of
Companies come from CustomerRepository
or would you also need a
CompanyRepository?
If a Company can live by itself (e.g. you edit, update, delete a company) I would suggest a Company is also an aggregate root for your domain (A Customer has a company and a company has a list of Customers). However if a Company is only obtainable via a Customer, I would treat a company as a ValueType/Value Object. If that is the case I would create a method on the customer repository to retrive all CompanyNames.
_repo.GetAllCompanyNames();

Repositories are indispensable, just go with them. They hide out data implementation. Used with an ORM you can pretty much forget about core db activity (CRUD). You'll find generally there's 1:1 map between an object and a repository, but nothing stops a repository returning anything it likes. Typically though you will acting upon an instance. Create non-object specific repositories for your queries that don't naturally fit into an existing one.
You will find a lot of conflicting arguments on the "Services" part of it - which some people like to split between Domain Services (i'd call these business rules that don't comfortably fit into a Core Domain Object) and Application Services (logical groupings of operations on Domain Objects). I've actually gone for one, separate project called [ProjectName].Core.Operations that lives in my [ProjectName].Core solution folder. Core + Operations = Domain.
An operation might be something that returns a DTO of all the information a View requires built via a number of repository calls and actions on the Domain. Some people (myself included) prefer to hide Repositories completely from Presentation and instead use Operations(Services) as a facade to the them. Just go with gut feeling on naming and don't be afraid, refactoring is healthy. Nothing wrong with a HomePageOperations class, with a method GetEveryThingINeedForTheHomepage returns a ThingsINeedForTheHomePage class.
Keep your controllers as light weight as possible. all they do is map data to views and views to data, talk to "Services" and handle application flow.
Download and have a look at S#arp architecture or the Who Can Help Me projects. The latter really shows a good architecture IMHO.
Lastly don't forget one of the major concerns of tiers is pluggability/testability, so I advise getting your head around a good IoC container (I'm a fan of Castle.Windsor). Again S#arp architecture is a good place to find about this.

You can pass more than one type of Repository to the controller (I'm assuming your using some kind of IoC container and constructor injection). You may then decide to compose some type of service object from all of the passed repositories.

DDD: Persisting aggregates

Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh

This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.

The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.

I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

Entity Framework as Repository and UnitOfWork?

I'm starting a new project and have decided to try to incorporate DDD patterns and also include Linq to Entities. When I look at the EF's ObjectContext it seems to be performing the functions of both Repository and Unit of Work patterns:
Repository in the sense that the underlying data level interface is abstracted from the entity representation and I can request and save data through the ObjectContext.
Unit Of Work in the sense that I can write all my inserts/updates to the objectContext and execute them all in one shot when I do a SaveChanges().
It seems redundant to put another layer of these patterns on top of the EF ObjectContext? It also seems that the Model classes can be incorporated directly on top of the EF generated entities using 'partial class'.
I'm new at DDD so please let me know if I'm missing something here.

I don't think that the Entity Framework is a good implementation of Repository, because:
The object context is insufficiently abstract to do good unit testing of things which reference it, since it is bound to the DB access. Having an IRepository reference instead works much better for creating unit tests.
When a client has access to the ObjectContext, the client can do pretty much anything it cares to. The only real control you have over this at all is to make certain types or properties private. It is hard to implement good data security this way.
On a non-trivial model, the ObjectContext is insufficiently abstract. You may, for example, have both tables and stored procedures mapped to the same entity type. You don't really want the client to have to distinguish between the two mappings.
On a related note, it is difficult to write comprehensive and well-enforce business rules and entity code. Indeed, whether or not it this is even a good idea is debatable.
On the other hand, once you have an ObjectContext, implementing the Repository pattern is trivial. Indeed, for cases that are not particularly complex, the Repository is something of a wrapper around the ObjectContext and the Entity types.

I would say that you should look at the ObjectContext as your UnitOfWork, and not as a repository.
An ObjectContext cannot be a repository -imho- since it is 'to generic'.
You should create your own Repositories, which have specialized methods (like GetCustomersWithGoldStatus for instance) next to the regular CRUD methods.
So, what I would do, is create repositories (one for each aggregate-root), and let those repositories use the ObjectContext.

I like to have a repository layer for the following reasons:
EF gotcha's
When you look at some of the current tutorials on EF (Code First version), it is apparent there's a number of gotcha's to be handled, particularly around object graphs (entities containing entities) and disconnected scenarios. I think a repository layer is great for wrapping these up in one place.
A clear picture of data access mechanisms
A repository gives a specific picture as to how the BL is accessing and updating the data store. It exposes methods that have a clear single purpose, and can be tested independently of the BL. Standard example from the textbooks, Find() to find a single entity. A more application specific example, Clear() to clear down a db table.
A place for optimizations
Inevitably you come up against performance hits when using vanilla EF. I use the repository to hide the optimization mechanisms from the BL.
Examples,
GetKeys() to project cached keys from the tables (for Insert/Update decisions). The reading of key only is faster and uses less memory than reading the full entity.
Bulk load via SqlBulkCopy. EF will insert by individual SQL statements. If you want a single statement to insert multiple rows, SqlBulkCopy is a good mechanism. The repository encapsulates this and provides metadata for SqlBulkCopy. As well as the Insert method, you need a StartBatch() and EndBatch() method, which is also an argument for a UnitOfWork layer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse