Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh
This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.
The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.
I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.
Related
When inserting a lot of interrelated nested objects, is it best practice to first create and persist the inner entites as to respect the foreign key relationsihps and move up in the hierarchy or is it better to create all the objects with their inner objects and persist only the outer object? So far, I have experienced that when doing the latter, Entity Framework seems to be intelligently figuring out what to insert first with regard to the relationships. But is there a caveat that one should be aware of? The first method seems to me as the classical SQL logic whereas the latter seems to suit the idea of Entity Framework more.
It's all about managing the risk of data corruption and ensuring that database is valuable.
When you persist the inner records first, you open yourself up to the possibility of bad data if the outer save fails. Depending on your application needs, orphaned inner records could be anywhere from a major problem to a minor annoyance.
When you save everything together, if a single record fails the entire save fails.
Of course, if the inner records are meaningful without the outer records (again, determined by your business needs) then you might be preventing the application from progressing.
In short: if the inner record is dependent on the outer, save together. If the inner is meaningful on its own, make the decision based on performance / readability.
Examples:
A House has meaning without a Homeowner. It would be acceptable to create a House on its own.
A HSA (Homeowner's Association) does not make sense without Homeowners. These should be created together.
Obviously, these examples are contrived and become trivial when using existing data. For the purpose of this question, we're assuming that both are being created at the same time.
Consider the following scenario where Jpa is used for persistence.
A student can be associated to different courses with a web form.
So this form displays different entities (student, course).
The Save button is pushed, the business logic modify some fields of the entities, but the db operation fails.
Unfortunately the enities in memory reflects the changes made by the business logic and this may create some inconsistency problem.
Is there a pattern useful in similar scenarios ?
Possible solution I thought and why I don't like them:
Don't want to revert back all the changes made the bussiness logic in case of db exception becuase it is an error prone job.
Don't want to reload the entities after the db exception in order to be sure they are aligned with the db. In fact this operation may fail too.
Otherwise I can clone the entities, make the changes and swap the clone with the original entity after a successful commit.
Anyway I would be more confortable following a well established pattern.
The Memento Pattern is a design pattern intended to offer 'rollback' functionality for the state of objects in memory. You need a Caretaker class, that asks the subject for a Memento, then attempts the persistence. The Caretaker would then give the Memento back to the subject, telling it to roll back to the state the Memento describes, if necessary.
I'm researching data layer underpinnings for a new web-based reporting system and have spent a lot of time evaluating ORM's over the last few days. That said, I've never dealt with "lazy loading" before and am confused at why its the default setting for LINQ queries in the Entity Framework. It seems like it creates a lot of network traffic and unnecessarily tasks the database with additional queries that could otherwise be resolved with joins.
Can someone describe a scenario in which lazy loading would be beneficial?
Some meta:
The new system will be working against a database with hundreds of tables and many terabytes of data in a production environment with over 3,000 concurrent users on the system 24 hours a day. They will be retrieving large datasets continuously. Is it possible that an ORM just isn't the right solution for our needs, especially since the app will be web-based?
When we talk about lazy loading we are talking about Navigation Properties (how we follow foreign keys). What lazy loading will do for us is to populate the entity from a remote table as we attempt to access that entity. For example if we have a model like this
public class TestEntity
{
public int Id{get;set;}
public AnotherEntity RemoteEntity{get;set;}
}
And call the following
var something = WhateverContext.TestEntities.First().RemoteEntity;
We will get 2 database calls, one for WhateverContext.TestEntities.First() and one for loading the remote entity.
I'm a web guy, (and more specifically an MVC guy) and for web stuff I don't think there is ever a good reason for wanting to do this, One database call is always going to be quicker than two if we require the same set of data.
The situation where I think that lazy loading is actually worth considering is when you don't know when you do your first query if you will need the second entity at all. In my opinion this is much more relevant for windows applications where we have a user who is performing actions in real time (rather than stateless MVC where users are requesting whole pages at once). For example I think lazy loading shines when we have a list of data with a details link, then we don't load the details until the user decides they want to see them.
I don't feel this extends to paging, sorting and filtering, IMO there should be one specifically crafted database query per page of data you are displaying, which returns exactly the data set required to display that page.
In terms of your performance question, I feel that EF (or another ORM) can probably meet your needs here but you want to be careful with how you are retrieving large datasets due to the way EF tracks entities. Check out my EF performance tuning cheat sheet, and read up on DetectChanges and AsNoTracking if you do decide to use EF with large queries.
Most ORMs will give you the option, when you're building up your object selections, to say "don't be lazy, go ahead and join", so if you're worried about it from an efficiency perspective, don't be. You can make it work (usually).
There are 2 particular cases I know of where lazy loading helps:
Chaining commands
What if you want to create a basic select, but then you want to run it through a sort and a filter function that's based on user input. You can simply pass the ORM object in, and attach the sort and filtering functionality to it. Instead of evaluating it each time, it only evaluates when it's actually used.
Avoiding huge, deep, highly-relational queries
What if you just need the IDs of some related fields? If it loads lazily, you don't have to worry about it joining a whole bunch of data and tables that you don't need, potentially slowing down the query and overusing bandwidth. Of course, if you DID want everything else, then you'll need to be explicit, or you may run into a problem where it lazily runs a query for each detail record. Like I mentioned at the outset, that's easily overcome in any ORM worth using.
A simple case is a result set of N records which you do not want to bring to the client at once. The benefit is that you are able to lazily load only what is needed for the clients demands, such as sorting, filtering, etc... An example would be a paging view where one could page through records and sort them accordingly, thus the client only needs N amount at a given time.
When you perform the LINQ query it translates that to SQL commands on the server side to provide only what is needed in the given context. It boils down to offloading work to the database and minimizing what you need to send back to the client.
Some will argue that ORM based lazy loading is wrong however that starts to move to semantics fairly quick and should be more about approach to design versus what is right and wrong.
I am new to Entity Framework and I want to get some points about the constellation EF, LINQ, POCOs and repositories.
We have a working solution with a repository which uses EF and POCOs to access the database. We are doing all our queries in LINQ through the context. We added the mapping into mapping classes which are loaded at the application start as the database/tables are already existing.
If I have a business case where I need to calculate for a specific company the amount of toys bought by the employees for their children.
How would I build up the repository / repositories?
A: call with one repository all employees of a company and then call in the service layer again another repo for every employee the children and so on?
B: call one repository which returns me the company with all employees, children and the toys?
A seems to me much cleaner and I can reuse the repositories more often. But B seams to be the more efficient but not reusable so much. Less repositories and the queries would get bigger and bigger.
That is just a small example... but we have much larger business cases. What is the best architectural approach in this case?
class Company
{
List<Employee> employees;
}
class Employee
{
List<Child> children;
}
class Child
{
List<Toy> toys;
}
You don't need to call repository to get company, employee, children and toys!
I need to calculate for a specific Company the amount of Toys bought by the employees for there children
So your business case is to have a single number or maybe number per toy or number per employee. You don't need to load all those entities and compute it in your application. You just need to write an aggregation query (either in Linq or SQL). This whole computation is supposed to run in the database.
If you need to hide the query behind the repository simply choose one where this business case belongs and expose the query as a new method for the repository.
While there is never a hard and fast rule, to me using a single unit of work for each of your aggregate roots (In your example, Company) seems to work the most consistently to keep things organized, and prevent concurrency errors since it will handle all the wiring up, and managing of your objects. There is a great MSDN post on what a unit of work is. An excerpt from that article:
In a way, you can think of the Unit of Work as a place to dump all
transaction-handling code. The responsibilities of the Unit of Work
are to:
Manage transactions.
Order the database inserts, deletes, and updates.
Prevent duplicate updates. Inside a single usage of a Unit of Work object, different parts of the code may mark the same Invoice
object as changed, but the Unit of Work class will only issue a
single UPDATE command to the database.
The value of using a Unit of Work pattern is to free the rest of your
code from these concerns so that you can otherwise concentrate on
business logic.
There are several blog posts about this, but the best one I've found is on how to implement it is here. There are some other ones which have been referred to from this site here, and here.
A seems to me much cleaner and I can reuse the Repositories more
often. But B seams to be the more efficient but not reusable so much.
less repositories and the queries would get bigger and bigger.
Generic repositories handle your concerns here, making it easy to create a repo for each of your data objects, while making it reusable and easily testable. Then your business logic can just be handled in a service layer in your unit of work ensuring that you don't have concurrency issues.
I'm using a transaction to insert multiple rows in multiple tables. For these rows I would like to add these rows in order. Upon calling SaveChanges all the rows are inserted out of order.
When not using a transaction and saving changes after each insertion does keep order, but I really need a transaction for all entries.
The order inserts/updates and deletes are made in the Entity Framework is dependent upon many things in the Entity Framework.
For example if you insert a new Product in a new Category we have to add the Category before the Product.
This means that if you have a large set of changes there are local ordering constraints that we must satisfy first, and indeed this is what we do.
The order that you do things on the context can be in conflict with these rules. For example if you do this:
ctx.AddToProducts(
new Product{
Name = "Bovril",
Category = new Category {Name = "Food"}
}
);
the effect is that the Product is added (to the context) first and then when we walk the graph we add the Category too.
i.e. the insert order into the context is:
Product
Category
but because of referential integrity constraints we must re-order like this before attempting to insert into the database:
Category
Product
So this kind of local re-ordering is kind of non-negotiable.
However if there are no local dependencies like this, you could in theory preserve ordering. Unfortunately we don't currently track 'when' something was added to the context, and for efficiency reason we don't track entities in order preserving structures like lists. As a result we can't currently preserve the order of unrelated inserts.
However we were debating this just recently, so I am keen to see just how vital this is to you?
Hope this helps
Alex
Program Manager Entity Framework Team
I'm in the process of crossing this bridge. I'm replacing NHibernate with EF and the issue that I'm running across is how lists are inserted into the DB. If I add items to a list like so (in pseduo code):
list.Add(testObject);
list.Add(testObject1);
I'm not currently guaranteed to get the same order when is run 'SaveChanges'. That's a shame because my list object (i.e. linked list) knows the order it was created in. Objects that are chained together using references MUST be saved to the DB in the same order. Not sure why you mentioned that you're "debating" this. =)