Jpa and rollback: a pattern to preserve entity consistency - jpa

Consider the following scenario where Jpa is used for persistence.
A student can be associated to different courses with a web form.
So this form displays different entities (student, course).
The Save button is pushed, the business logic modify some fields of the entities, but the db operation fails.
Unfortunately the enities in memory reflects the changes made by the business logic and this may create some inconsistency problem.
Is there a pattern useful in similar scenarios ?
Possible solution I thought and why I don't like them:
Don't want to revert back all the changes made the bussiness logic in case of db exception becuase it is an error prone job.
Don't want to reload the entities after the db exception in order to be sure they are aligned with the db. In fact this operation may fail too.
Otherwise I can clone the entities, make the changes and swap the clone with the original entity after a successful commit.
Anyway I would be more confortable following a well established pattern.

The Memento Pattern is a design pattern intended to offer 'rollback' functionality for the state of objects in memory. You need a Caretaker class, that asks the subject for a Memento, then attempts the persistence. The Caretaker would then give the Memento back to the subject, telling it to roll back to the state the Memento describes, if necessary.

Related

Entity Framework - What is Service layer's role in repository pattern

Currently, I'm working on an ASP MVC4 project using EF5 with repository pattern.
I have just joined this project.
In this project, we have implemented many repository class, these repository will responsible for search, update, delete etc using the dbcontext, they also return the DTO classes and in the service layer we use those repositories to get the DTO then convert to the view model.
Every time I want to do some logic with the entities, I will go to the repositories and write code here. So I wonder why we need the service layer and the repositories at the same time, we can write the logic code directly in the service layer or use repositories in the controller directly.
I don't see any advantage here since our source code is too complicated and we need so many classes (DTO, viewmodel...) and I think the performance will be not good compare to using repositories or services directly.
Can you point out the key here? Thank you.
It's very simple:
Repositories are for data access
Services are for business logic
But once you've started to instill business concepts into repositories it becomes very hard to turn the tide.
An example of how easily business concepts mingle with data access concerns is soft deletes. Let's say that there is a table journal_voucher from which rows should never be deleted, only inactivated. So there is a boolean (bit) field IsActive that's set to false if a row should be off the record.
Now it seems obvious to have a Delete method in JournalRepository that sets the IsActive flag in stead of deleting an entity. Likewise, any retrieval methods may automatically filter out inactive records.
Wrong. Being active or inactive is a business concept. For a data access layer the content of any database field is meaningless. It's only supposed to read and write it properly.
Now see what happens: other entities will probably just be hard-deleted. Maybe yet others can't ever be deleted, or, why not, never be created. If one repository has this active/inactive responsibility, the next obvious step is to implement these other CRUD rules in the appropriate repositories as well. Then a business requirement emerges that only records of the current year are interesting... Oh, and we have to check whether a journal_voucher can even be inactivated... And so on and so forth.
You end up with a host of very different repository classes and scattered business logic.
I believe that if you decide to use your own repositories on top of Entity Framework's repositories (DbSets) they should be generic repositories. That is: for each entity class they do exactly the same thing. It's even arguable whether they should return DTOs instead of EF entity objects (I'd vote for the latter).
Everything else is done in services. So there will probably be a JournalService that inactivates journal_vouchers, with proper checking. The service decides that IsActive is set to false and instructs the repository to update the entity. (In fact a unit of work should do that, but that's a different story).
This distinction has many benefits:
The rest of the world only communicates with services.
Therefore, repositories can safely return IQueryable. The services limit the amount of retrieved data.
It's much easier to decide where business logic involving multiple entities belongs (i.e. almost all business logic).
It is much easier with dependency injection.
The repositories can be mocked relatively easily and the services can be readily unit tested without duplicating business rules in mocked repositories.

What is the point of the Update function in the Repository EF pattern?

I am using the repository pattern within EF using an Update function I found online
public class Repository<T> : IRepository<T> where T : class
{
public virtual void Update(T entity)
{
var entry = this.context.Entry(entity);
this.dbset.Attach(entity);
entry.State = System.Data.Entity.EntityState.Modified;
}
}
I then use it within a DeviceService like so:
public void UpdateDevice(Device device)
{
this.serviceCollection.Update(device);
this.uow.Save();
}
I have realise that what this actually does it update ALL of the device's information rather than just update the property that changed. This means in a multi threaded environment changes can be lost.
After testing I realised I could just change the Device then call uow.Save() which both saved the data and didnt overwrite any existing changes.
So my question really is - What is the point in the Update() function? It appears in almost every Repository pattern I find online yet it seems destructive.
I wouldn't call this generic Update method generally "destructive" but I agree that it has limited use cases that are rarely discussed in those repository implementations. If the method is useful or not depends on the scenario where you want to apply it.
In an "attached scenario" (Windows Forms application for instance) where you load entities from the database, change some properties while they are still attached to the EF context and then save the changes the method is useless because the context will track all changes anyway and know at the end which columns have to be updated or not. You don't need an Update method at all in this scenario (hint: DbSet<T> (which is a generic repository) does not have an Update method for this reason). And in a concurrency situation it is destructive, yes.
However, it is not clear that a "change tracked update" isn't sometimes destructive either. If two users change the same property to different values the change tracked update for both users would save the new column value and the last one wins. If this is OK or not depends on the application and how secure it wants changes to be done. If the application disallows to ever edit an object that is not the last version in the database before the change is saved it cannot allow that the last save wins. It would have to stop, force the user to reload the latest version and take a look at the last values before he enters his changes. To handle this situation concurrency tokens are necessary that would detect that someone else changed the record in the meantime. But those concurrency checks work the same way with change tracked updates or when setting the entity state to Modified. The destructive potential of both methods is stopped by concurrency exceptions. However, setting the state to Modified still produces unnecessary overhead in that it writes unchanged column values to the database.
In a "detached scenario" (Web application for example) the change tracked update is not available. If you don't want to set the whole entity to Modified you have to load the latest version from the database (in a new context), copy the properties that came from the UI and save the changes again. However, this doesn't prevent that changes another user has done in the meantime get overwritten, even if they are changes on different properties. Imagine two users load the same customer entity into a web form at the same time. User 1 edits the customer name and saves. User 2 edits the customer's bank account number and saves a few seconds later. If the entity gets loaded into the new context to perform the update for User 2 EF would just see that the customer name in the database (that already includes the change of User 1) is different from the customer name that User 2 sent back (which is still the old customer name). If you copy the customer name value the property will be marked as Modified and the old name will be written to the database and overwrite the change of User 1. This update would be just as destructive as setting the whole entity state to Modified. In order to avoid this problem you would have to either implement some custom change tracking on client side that recognizes if User 2 changed the customer name and if not it just doesn't copy the value to the loaded entity. Or you would have to work with concurrency tokens again.
You didn't mention the biggest limitation of this Update method in your question - namely that it doesn't update any related entities. For example, if your Device entity had a related Parts collection and you would edit this collection in a detached UI (add/remove/modify items) setting the state of the parent Device to Modified won't save any of those changes to the database. It will only affect the scalar (and complex) properties of the parent Device itself. At the time when I used repos of this kind I named the update method FlatUpdate to indicate that limitation better in the method name. I've never seen a generic "DeepUpdate". Dealing with complex object graphs is always a non-generic thing that has to be written individually per entity type and depending on the situation. (Fortunately a library like GraphDiff can limit the amount of code that has to be written for such graph updates.)
To cut a long story short:
For attached scenarios the Update method is redundant as EFs automatic change tracking does all the necessary work to write correct UPDATE statements to the database - including changes in related object graphs.
For detached scenarios it is a comfortable way to perform updates of simple entities without relationships.
Updating object graphs with parent and child entities in a detached scenario can't be done with such a simplified Update method and requires significantly more (non-generic) work.
Safe concurrency control needs more sophisticated tools, like enabling the optimistic concurrency checks that EF provides and handling the resulting concurrency exceptions in a user-friendly way.
After Slauma's very profound and practical answer I'd like to zoom in on some basic principles.
In this MSDN article there is one important sentence
A repository separates the business logic from the interactions with the underlying data source or Web service.
Simple question. What has the business logic to do with Update?
Fowler defines a repository pattern as
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
So as far as the business logic is concerned a repository is just a collection. Collection semantics are about adding and removing objects, or checking whether an object exists. The main operations are Add, Remove, and Contains. Check out the ICollection<T> interface: no Update method there.
It's not the business logic's concern whether objects should be marked as 'modified'. It just modifies objects and relies on other layers to detect and persist changes. Exposing an Update method
makes the business layer responsible for tracking and reporting its changes. Soon all kinds of if constructs will creep in to check whether values have changes or not.
breaks persistence ignorance, because the mere fact that storing updates is something else than storing new objects is a data layer detail.
prevents the data access layer from doing its job properly. Indeed, the implementation you show is destructive. While the Data Access Layer may be perfectly capable of perceiving and persisting granular changes, this method marks a whole object as modified and forces a swiping UPDATE.

On observing an execution tree of interdependent models in MVC

I've developed on the Yii Framework for a while now (4 months), and so far I have encountered some issues with MVC that I want to share with experienced developers out there. I'll present these issues by listing their levels of complexity.
[Level 1] CR(create update) form. First off, we have a lot of forms. Each form itself is a model, so each has some validation rules, some attributes, and some operations to perform on the attributes. In a lot of cases, each of these forms does both updating and creating records in the db using a single active record object.
-> So at this level of complexity, a form has to
when opened,
be able to display the db-friendly data from the db in a human-friendly way
be able to display all the form fields with the attributes of the active record object. Adding, removing, altering columns from the db table has to affect the display of the form.
when saves, be able to format the human-friendly data to db-friendly data before getting the data
when validates, be able to perform basic validations enforced by the active record object, it also has to perform other validations to fulfill some business rules.
when validating fails, be able to roll back changes made to the attribute as well as changes made to the db, and present the user with their originally entered data.
[Level 2] Extended CR form. A form that can perform creation/update of records from different tables at once. Not just that, whether a form would create/update of one of its records can sometimes depend on other conditions (more business rules), so a form can sometimes update records at table A,B but not D, and sometimes update records at A,D but not B
-> So at this level of complexity, we see a form has to:
be able to satisfy [Level 1]
be able to conditionally create/update of certain records, conditionally create/update of certain columns of certain records.
[Level 3] The Tree of Models. The role of a form in an application is, in many ways, a port that let user's interact with your application. To satisfy requests, this port will interact with many other objects which, in turn, interact with many more objects. Some of these objects can be seen as models. Active Record is a model, but a Mailer can also be a model, so is a RobotArm. These models use one another to satisfy a user's request. Each model can perform their own operation and the whole tree has to be able to roll back any changes made in the case of error/failure.
Has anyone out there come across or been able to solve these problems?
I've come up with many stuffs like encapsulating model attributes in ModelAttribute objects to tackle their existence throughout tiers of client, server, and db.
I've also thought we should give the tree of models an Observer to observe and notify the observed models to rollback changes when errors occur. But what if multiple observers can exist, what if a node use its parent's observer but give its children another observers.
Engineers, developers, Rails, Yii, Zend, ASP, JavaEE, any MVC guys, please join this discussion for the sake of science.
--Update to teresko's response:---
#teresko I actually intended to incorporate the services into the execution inside a unit of work and have the Unit of work not worry about new/updated/deleted. Each object inside the unit of work will be responsible for its state and be required to implement their own commit() and rollback(). Once an error occur, the unit of work will rollback all changes from the newest registered object to the oldest registered object, since we're not only dealing with database, we can have mailers, publishers, etc. If otherwise, the tree executes successfully, we call commit() from the oldest registered object to the newest registered object. This way the mailer can save the mail and send it on commit.
Using data mapper is a great idea, but We still have to make sure columns in the db matches data mapper and domain object. Moreover, an extended CR form or a model that has its attributes depending on other models has to match their attributes in terms of validation and datatype. So maybe an attribute can be an object and shipped from model to model? An attribute can also tell if it's been modified, what validation should be performed on it, and how it can be human-friendly, application-friendly, and db-friendly. Any update to the db schema will affect this attribute, and, thereby throwing exceptions that requires developers to make changes to the system to satisfy this change.
The cause
The root of your problem is misuse of active record pattern. AR is meant for simple domain entities with only basic CRUD operations. When you start adding large amount of validation logic and relations between multiple tables, the pattern starts to break apart.
Active record, at its best, is a minor SRP violation, for the sake of simplicity. When you start piling on responsibilities, you start to incur severe penalties.
Solution(s)
Level 1:
The best option is the separate the business and storage logic. Most often it is done by using domain object and data mappers:
Domain objects (in other materials also known as business object or domain model objects) deal with validation and specific business rules and are completely unaware of, how (or even "if") data in them was stored and retrieved. They also let you have object that are not directly bound to a storage structures (like DB tables).
For example: you might have a LiveReport domain object, which represents current sales data. But it might have no specific table in DB. Instead it can be serviced by several mappers, that pool data from Memcache, SQL database and some external SOAP. And the LiveReport instance's logic is completely unrelated to storage.
Data mappers know where to put the information from domain objects, but they do not any validation or data integrity checks. Thought they can be able to handle exceptions that cone from low level storage abstractions, like violation of UNIQUE constraint.
Data mappers can also perform transaction, but, if a single transaction needs to be performed for multiple domain object, you should be looking to add Unit of Work (more about it lower).
In more advanced/complicated cases data mappers can interact and utilize DAOs and query builders. But this more for situation, when you aim to create an ORM-like functionality.
Each domain object can have multiple mappers, but each mapper should work only with specific class of domain objects (or a subclass of one, if your code adheres to LSP). You also should recognize that domain object and a collection of domain object are two separate things and should have separate mappers.
Also, each domain object can contain other domain objects, just like each data mapper can contain other mappers. But in case of mappers it is much more a matter of preference (I dislike it vehemently).
Another improvement, that could alleviate your current mess, would be to prevent application logic from leaking in the presentation layer (most often - controller). Instead you would largely benefit from using services, that contain the interaction between mappers and domain objects, thus creating a public-ish API for your model layer.
Basically, services you encapsulate complete segments of your model, that can (in real world - with minor effort and adjustments) be reused in different applications. For example: Recognition, Mailer or DocumentLibrary would all services.
Also, I think I should not, that not all services have to contain domain object and mappers. A quite good example would be the previously mentioned Mailer, which could be used either directly by controller, or (what's more likely) by another service.
Level 2:
If you stop using the active record pattern, this become quite simple problem: you need to make sure, that you save only data from those domain objects, which have actually changed since last save.
As I see it, there are two way to approach this:
Quick'n'Dirty
If something changed, just update it all ...
The way, that I prefer is to introduce a checksum variable in the domain object, which holds a hash from all the domain object's variables (of course, with the exception of checksum it self).
Each time the mapper is asked to save a domain object, it calls a method isDirty() on this domain object, which checks, if data has changed. Then mapper can act accordingly. This also, with some adjustments, can be used for object graphs (if they are not to extensive, in which case you might need to refactor anyway).
Also, if your domain object actually gets mapped to several tables (or even different forms of storage), it might be reasonable to have several checksums, for each set of variables. Since mapper are already written for specific classes of domain object, it would not strengthen the existing coupling.
For PHP you will find some code examples in this ansewer.
Note: if your implementation is using DAOs to isolate domain objects from data mappers, then the logic of checksum based verification, would be moved to the DAO.
Unit of Work
This is the "industry standard" for your problem and there is a whole chapter (11th) dealing with it in PoEAA book.
The basic idea is this, you create an instance, that acts like controller (in classical, not in MVC sense of the word) between you domain objects and data mappers.
Each time you alter or remove a domain object, you inform the Unit of Work about it. Each time you load data in a domain object, you ask Unit of Work to perform that task.
There are two ways to tell Unit of Work about the changes:
caller registration: object that performs the change also informs the Unit of Work
object registration: the changed object (usually from setter) informs the Unit of Work, that it was altered
When all the interaction with domain object has been completed, you call commit() method on the Unit of Work. It then finds the necessary mappers and store stores all the altered domain objects.
Level 3:
At this stage of complexity the only viable implementation is to use Unit of Work. It also would be responsible for initiating and committing the SQL transactions (if you are using SQL database), with the appropriate rollback clauses.
P.S.
Read the "Patterns of Enterprise Application Architecture" book. It's what you desperately need. It also would correct the misconception about MVC and MVC-inspired design patters, that you have acquired by using Rails-like frameworks.

Entity framework and inheritance: NotSupportedException

I'm getting
System.NotSupportedException: All
objects in the EntitySet
'Entities.Message' must have unique
primary keys. However, an instance of
type 'Model.Message' and an instance
of type 'Model.Comment' both have the
same primary key value
but I have no idea what this means.
Using EF4, I have a bunch of entities of type Message. Some of these messages are actually a subtype, Comment, inheritance by table-per-type. Just
DB.Message.First();
will produce the exception. I have other instances of subtyping where I don't experience problems but I can't see any discrepencies. Sometimes, though, the problem goes away if I restart the development server, but not always.
Edit:
I've worked out (should have before) that the problem is a fault of the stored procedure fetching my Messages. The way this is currently set up as that all the fields pertaining to Message is fetched, the Comment table is ignored by the sproc. The context then proceeds to muck this up, probably by fetching those Messages that are also Comments again, as you suggested. How to do this properly is the central issue at hand. I've found some indications to a solution at http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb0bb421-ba8e-4b35-b7a7-950901adb602.
As you infer, it looks like the Context is fetching a Comment as a Message (not knowing that it is a comment). Later, you ask for the actual Comment, so the context fetches the Comment. Now you have two object instances in the Context with the same ID - one is a Message and one is a Comment.
It seems that the exception is not being thrown until after both objects have been loaded (ie when you try to access the Message the second time). If you can find a way to remove the Message from Context when the Comment is loaded, this may solve your problem.
Another option might be to use the Table-per-hierarchy model. This results in a bad database design but at the end of the day you have to use what works.
You might be able to avoid the problem by ensuring that the objects are loaded as Comments first. This way, when you ask for the Message, the Context already knows about it.
Also consider using Composition over Inheritance, such that a Message has 0..1 CommentDetails.
The final suggestion is to remove the dependency on the Entity Framework from your Control code, and create a Data Access Layer which references the EF and retrieves your objects. The DAL can turn Entity Framework objects into a different set of Entity objects which are easier to use in code. This approach will produce a lot of code overhead, but may be suitable if you cannot use the Entity Framework to produce an Entity model which represents your Entities in the way you want to work with them.
To summarize, unless MS fix this issue, there is no solution to your problem which does not involve a rethink of your approach. Unfortunately the Entity Framework is not ideal, especially for complex Entity models - you might be better off creating your own DAL and bypassing the EF altogether.
It sounds like you are pulling two records into memory one into message and one into comment.
Possible prblems:
There are two physical messages with the same id
The same message is being pulled up as a message and a comment
The same message is being pulled up twice into the same context
That the problem sometimes goes away when you restart, points to a problem with cleaning up of context. Are you using "using" statements.
Do you have functionality for changing from a message to a comment?
I am not an EF kind of guy (busy working with NHibernate, haven't had time to get up to date with EF yet) so I may be totally wrong here, but could the problem be that the two tables (since you are using inheritance by table-per-type) have primary keys that collide?
If you check the data in both tables, do primary key values collide?

DDD: Persisting aggregates

Let's consider the typical Order and OrderItem example. Assuming that OrderItem is part of the Order Aggregate, it an only be added via Order. So, to add a new OrderItem to an Order, we have to load the entire Aggregate via Repository, add a new item to the Order object and persist the entire Aggregate again.
This seems to have a lot of overhead. What if our Order has 10 OrderItems? This way, just to add a new OrderItem, not only do we have to read 10 OrderItems, but we should also re-insert all these 10 OrderItems again. (This is the approach that Jimmy Nillson has taken in his DDD book. Everytime he wants to persists an Aggregate, he clears all the child objects, and then re-inserts them again. This can cause other issues as the ID of the children are changed everytime because of the IDENTITY column in database.)
I know some people may suggest to apply Unit of Work pattern at the Aggregate Root so it keeps track of what has been changed and only commit those changes. But this violates Persistence Ignorance (PI) principle because persistence logic is leaking into the Domain Model.
Has anyone thought about this before?
Mosh
This doesn't have to be a problem, some ORM's support lazy lists.
e.g.
You could load the order entity and add items to the Details collection w/o actually materializing all of the other entities in that list.
I think N/Hibernate supports this.
If you are writing your own entity persistence code w/o any ORM, then you are pretty much out of luck, you would have to re-implement the same dirty tracking machinery as ORMappers give you for free.
The entire aggregate must be loaded from database because DDD assumes that aggregate roots ensure consistency within boundaries of aggregates. For these rules to be checed, all necessary data must be loaded. If there is a requirement that an order can be worth no more then $100000 for particular customer, aggregate root (Order) must check this rule before persisting changes. This does not imply that all the exisiting items must be loaded and their value summed up. Order can maintain pre-calculated sum of existing items which is updated on adding new ones. This way checking the business rule requires only Order data to be loaded when adding new items.
I'm not 100% sure about this approach , but I think applying unit of work pattern could be the answer . Keeping in mind that any transaction should be done , in application or domain services , you could populate the unit of work class/object with the objects from the aggregate that you have changed . After that let the UoW class/object do the magic (ofcourse building a proper UoW might be hard for some cases)
Here is a description of the unit of work pattern from here :
A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.