New entity ID in domain event - entity-framework

I'm building an application with a domain model using CQRS and domain events concepts (but no event sourcing, just plain old SQL). There was no problem with events of SomethingChanged kind. Then I got stuck in implementing SomethingCreated events.
When I create some entity which is mapped to a table with identity primary key then I don't know the Id until the entity is persisted. Entity is persistence ignorant so when publishing an event from inside the entity, Id is just not known - it's magically set after calling context.SaveChanges() only. So how/where/when can I put the Id in the event data?
I was thinking of:
Including the reference to the entity in the event. That would work inside the domain but not necesarily in a distributed environment with multiple autonomous system communicating by events/messages.
Overriding SaveChanges() to somehow update events enqueued for publishing. But events are meant to be immutable, so this seems very dirty.
Getting rid of identity fields and using GUIDs generated in the entity constructor. This might be the easiest but could hit performance and make other things harder, like debugging or querying (where id = 'B85E62C3-DC56-40C0-852A-49F759AC68FB', no MIN, MAX etc.). That's what I see in many sample applications.
Hybrid approach - leave alone the identity and use it mainly for foreign keys and faster joins but use GUID as the unique identifier by which i pull the entities from the repository in the application.

Personally I like GUIDs for unique identifiers, especially in multi-user, distributed environments where numeric ids cause problems. As such, I never use database generated identity columns/properties and this problem goes away.
Short of that, since you are following CQRS, you undoubtedly have a CreateSomethingCommand and corresponding CreateSomethingCommandHandler that actually carries out the steps required to create the new instance and persist the new object using the repository (via context.SaveChanges). I will raise the SomethingCreated event here rather than in the domain object itself.
For one, this solves your problem because the command handler can wait for the database operation to complete, pull out the identity value, update the object then pass the identity in the event. But, more importantly, it also addresses the tricky question of exactly when is the object 'created'?
Raising a domain event in the constructor is bad practice as constructors should be lean and simply perform initialization. Plus, in your model, the object isn't really created until it has an ID assigned. This means there are additional initialization steps required after the constructor has executed. If you have more than one step, do you enforce the order of execution (another anti-pattern) or put a check in each to recognize when they are all done (ooh, smelly)? Hopefully you can see how this can quickly spiral out of hand.
So, my recommendation is to raise the event from the command handler. (NOTE: Even if you switch to GUID identifiers, I'd follow this approach because you should never raise events from constructors.)

Related

CQS and updating an existing entity

I'm just trying to get my head around how one goes about updating an entity using CQS. Say the UI allows a user to update several properties of a particular entity, and on submit, in the back-end, an update command is created and dispatched.
The part I'm not quite understanding is:
does the cmd handler receiving the message from the dispatcher then retrieve the existing entity from the DB to then map the received stock item properties to then save? Or
is the retrieval of the existing item done prior to the dispatching of the cmd msg, to which it is then attached (the retrieved entity attached to cmd that is then dispatched)?
My understanding is that CQS allows for a more easier transition to CQRS later on (if necessary)? Is that correct?
If that is the case, the problem with 2 above is that queries could be retrieved from a schema looking very different to that from the command/write schema. Am I missing something?
does the cmd handler receiving the message from the dispatcher then retrieve the existing entity from the DB to then map the received stock item properties to then save
Yes.
If you want to understand cqrs, it will help a lot to read up on ddd -- not that they are necessarily coupled, but because a lot of the literature on CQRS assumes that you are familiar with the DDD vocabulary.
But a rough outline of the responsibility of the command handler is
Load the current state of the target of the command
Invoke the command on the target
Persist the changes to the book of record
My understanding is that CQS allows for a more easier transition to CQRS later on (if necessary)?
That's not quite right -- understanding Meyer's distinction between command and queries make the CQRS pattern easier to think about, but I'm not convinced that actually helps in the transition all that much.
If that is the case, the problem with 2 above is that queries could be retrieved from a schema looking very different to that from the command/write schema. Am I missing something?
Maybe - queries typically run off of a schema that is optimized for query; another way of thinking about it is that the queries are returning different representations of the same entities.
Where things can get tricky is when the command representation and the query representation are decoupled -- aka eventual consistency. In a sense, you are always querying state in the past, but dispatching commands to state in the present. So you will need to have some mechanism to deal with commands that incorrectly assume the target is still in some previous state.

CQRS/Event Sourcing: How to enforce data-integrity?

If I implement CQRS and Event Sourcing, how do I maintain the integrity and consistency of the data, assuming the final storage (read storage) of the data is in a RDBMS?
What if a an event is published but the RDBMS rejects the data derived from it, because of a check violation or missing FK reference?
CQRS implies at least 2 models: write and read. Both can be stored in the same db or in different dbs. With ES , you're using an event store which can be itself implemented on top of a rdbms (in .net there is NEventStore which afaik works with many databases rdbms or not).
You're saying you have the read model in a rdbms, that's great. Nothing needs to be enforced because it is the read model, nobody outside the model updater touches that. The app clients can ONLY query that model, never modify it. That's why you have 2 models in the first place so that the Domain would work with the 'write' model while the rest of the app works with the 'read' model.
Also, the RDBMS shouldn't really reject anything. An event handler should be idempotent so if, let's say, the handler inserts something with an id that should be unique, the second invocation should simply ignore any unique constraint violation. With CQRS you're using the RDBMS constraints to support idempotency, not to implement some business rule.
Also, think of the read model as the 'throw away model', that can be anytime changed or rebuilt.
Our read models are very simple. We don't have foriegn keys in them so that isn't possible. Why would you need foriegn keys? You may have foriegn key values but you don't need constraints as that is enforced by your domain model. You are really only reading and can rebuild all the read store if required.

What is the point of the Update function in the Repository EF pattern?

I am using the repository pattern within EF using an Update function I found online
public class Repository<T> : IRepository<T> where T : class
{
public virtual void Update(T entity)
{
var entry = this.context.Entry(entity);
this.dbset.Attach(entity);
entry.State = System.Data.Entity.EntityState.Modified;
}
}
I then use it within a DeviceService like so:
public void UpdateDevice(Device device)
{
this.serviceCollection.Update(device);
this.uow.Save();
}
I have realise that what this actually does it update ALL of the device's information rather than just update the property that changed. This means in a multi threaded environment changes can be lost.
After testing I realised I could just change the Device then call uow.Save() which both saved the data and didnt overwrite any existing changes.
So my question really is - What is the point in the Update() function? It appears in almost every Repository pattern I find online yet it seems destructive.
I wouldn't call this generic Update method generally "destructive" but I agree that it has limited use cases that are rarely discussed in those repository implementations. If the method is useful or not depends on the scenario where you want to apply it.
In an "attached scenario" (Windows Forms application for instance) where you load entities from the database, change some properties while they are still attached to the EF context and then save the changes the method is useless because the context will track all changes anyway and know at the end which columns have to be updated or not. You don't need an Update method at all in this scenario (hint: DbSet<T> (which is a generic repository) does not have an Update method for this reason). And in a concurrency situation it is destructive, yes.
However, it is not clear that a "change tracked update" isn't sometimes destructive either. If two users change the same property to different values the change tracked update for both users would save the new column value and the last one wins. If this is OK or not depends on the application and how secure it wants changes to be done. If the application disallows to ever edit an object that is not the last version in the database before the change is saved it cannot allow that the last save wins. It would have to stop, force the user to reload the latest version and take a look at the last values before he enters his changes. To handle this situation concurrency tokens are necessary that would detect that someone else changed the record in the meantime. But those concurrency checks work the same way with change tracked updates or when setting the entity state to Modified. The destructive potential of both methods is stopped by concurrency exceptions. However, setting the state to Modified still produces unnecessary overhead in that it writes unchanged column values to the database.
In a "detached scenario" (Web application for example) the change tracked update is not available. If you don't want to set the whole entity to Modified you have to load the latest version from the database (in a new context), copy the properties that came from the UI and save the changes again. However, this doesn't prevent that changes another user has done in the meantime get overwritten, even if they are changes on different properties. Imagine two users load the same customer entity into a web form at the same time. User 1 edits the customer name and saves. User 2 edits the customer's bank account number and saves a few seconds later. If the entity gets loaded into the new context to perform the update for User 2 EF would just see that the customer name in the database (that already includes the change of User 1) is different from the customer name that User 2 sent back (which is still the old customer name). If you copy the customer name value the property will be marked as Modified and the old name will be written to the database and overwrite the change of User 1. This update would be just as destructive as setting the whole entity state to Modified. In order to avoid this problem you would have to either implement some custom change tracking on client side that recognizes if User 2 changed the customer name and if not it just doesn't copy the value to the loaded entity. Or you would have to work with concurrency tokens again.
You didn't mention the biggest limitation of this Update method in your question - namely that it doesn't update any related entities. For example, if your Device entity had a related Parts collection and you would edit this collection in a detached UI (add/remove/modify items) setting the state of the parent Device to Modified won't save any of those changes to the database. It will only affect the scalar (and complex) properties of the parent Device itself. At the time when I used repos of this kind I named the update method FlatUpdate to indicate that limitation better in the method name. I've never seen a generic "DeepUpdate". Dealing with complex object graphs is always a non-generic thing that has to be written individually per entity type and depending on the situation. (Fortunately a library like GraphDiff can limit the amount of code that has to be written for such graph updates.)
To cut a long story short:
For attached scenarios the Update method is redundant as EFs automatic change tracking does all the necessary work to write correct UPDATE statements to the database - including changes in related object graphs.
For detached scenarios it is a comfortable way to perform updates of simple entities without relationships.
Updating object graphs with parent and child entities in a detached scenario can't be done with such a simplified Update method and requires significantly more (non-generic) work.
Safe concurrency control needs more sophisticated tools, like enabling the optimistic concurrency checks that EF provides and handling the resulting concurrency exceptions in a user-friendly way.
After Slauma's very profound and practical answer I'd like to zoom in on some basic principles.
In this MSDN article there is one important sentence
A repository separates the business logic from the interactions with the underlying data source or Web service.
Simple question. What has the business logic to do with Update?
Fowler defines a repository pattern as
Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.
So as far as the business logic is concerned a repository is just a collection. Collection semantics are about adding and removing objects, or checking whether an object exists. The main operations are Add, Remove, and Contains. Check out the ICollection<T> interface: no Update method there.
It's not the business logic's concern whether objects should be marked as 'modified'. It just modifies objects and relies on other layers to detect and persist changes. Exposing an Update method
makes the business layer responsible for tracking and reporting its changes. Soon all kinds of if constructs will creep in to check whether values have changes or not.
breaks persistence ignorance, because the mere fact that storing updates is something else than storing new objects is a data layer detail.
prevents the data access layer from doing its job properly. Indeed, the implementation you show is destructive. While the Data Access Layer may be perfectly capable of perceiving and persisting granular changes, this method marks a whole object as modified and forces a swiping UPDATE.

On observing an execution tree of interdependent models in MVC

I've developed on the Yii Framework for a while now (4 months), and so far I have encountered some issues with MVC that I want to share with experienced developers out there. I'll present these issues by listing their levels of complexity.
[Level 1] CR(create update) form. First off, we have a lot of forms. Each form itself is a model, so each has some validation rules, some attributes, and some operations to perform on the attributes. In a lot of cases, each of these forms does both updating and creating records in the db using a single active record object.
-> So at this level of complexity, a form has to
when opened,
be able to display the db-friendly data from the db in a human-friendly way
be able to display all the form fields with the attributes of the active record object. Adding, removing, altering columns from the db table has to affect the display of the form.
when saves, be able to format the human-friendly data to db-friendly data before getting the data
when validates, be able to perform basic validations enforced by the active record object, it also has to perform other validations to fulfill some business rules.
when validating fails, be able to roll back changes made to the attribute as well as changes made to the db, and present the user with their originally entered data.
[Level 2] Extended CR form. A form that can perform creation/update of records from different tables at once. Not just that, whether a form would create/update of one of its records can sometimes depend on other conditions (more business rules), so a form can sometimes update records at table A,B but not D, and sometimes update records at A,D but not B
-> So at this level of complexity, we see a form has to:
be able to satisfy [Level 1]
be able to conditionally create/update of certain records, conditionally create/update of certain columns of certain records.
[Level 3] The Tree of Models. The role of a form in an application is, in many ways, a port that let user's interact with your application. To satisfy requests, this port will interact with many other objects which, in turn, interact with many more objects. Some of these objects can be seen as models. Active Record is a model, but a Mailer can also be a model, so is a RobotArm. These models use one another to satisfy a user's request. Each model can perform their own operation and the whole tree has to be able to roll back any changes made in the case of error/failure.
Has anyone out there come across or been able to solve these problems?
I've come up with many stuffs like encapsulating model attributes in ModelAttribute objects to tackle their existence throughout tiers of client, server, and db.
I've also thought we should give the tree of models an Observer to observe and notify the observed models to rollback changes when errors occur. But what if multiple observers can exist, what if a node use its parent's observer but give its children another observers.
Engineers, developers, Rails, Yii, Zend, ASP, JavaEE, any MVC guys, please join this discussion for the sake of science.
--Update to teresko's response:---
#teresko I actually intended to incorporate the services into the execution inside a unit of work and have the Unit of work not worry about new/updated/deleted. Each object inside the unit of work will be responsible for its state and be required to implement their own commit() and rollback(). Once an error occur, the unit of work will rollback all changes from the newest registered object to the oldest registered object, since we're not only dealing with database, we can have mailers, publishers, etc. If otherwise, the tree executes successfully, we call commit() from the oldest registered object to the newest registered object. This way the mailer can save the mail and send it on commit.
Using data mapper is a great idea, but We still have to make sure columns in the db matches data mapper and domain object. Moreover, an extended CR form or a model that has its attributes depending on other models has to match their attributes in terms of validation and datatype. So maybe an attribute can be an object and shipped from model to model? An attribute can also tell if it's been modified, what validation should be performed on it, and how it can be human-friendly, application-friendly, and db-friendly. Any update to the db schema will affect this attribute, and, thereby throwing exceptions that requires developers to make changes to the system to satisfy this change.
The cause
The root of your problem is misuse of active record pattern. AR is meant for simple domain entities with only basic CRUD operations. When you start adding large amount of validation logic and relations between multiple tables, the pattern starts to break apart.
Active record, at its best, is a minor SRP violation, for the sake of simplicity. When you start piling on responsibilities, you start to incur severe penalties.
Solution(s)
Level 1:
The best option is the separate the business and storage logic. Most often it is done by using domain object and data mappers:
Domain objects (in other materials also known as business object or domain model objects) deal with validation and specific business rules and are completely unaware of, how (or even "if") data in them was stored and retrieved. They also let you have object that are not directly bound to a storage structures (like DB tables).
For example: you might have a LiveReport domain object, which represents current sales data. But it might have no specific table in DB. Instead it can be serviced by several mappers, that pool data from Memcache, SQL database and some external SOAP. And the LiveReport instance's logic is completely unrelated to storage.
Data mappers know where to put the information from domain objects, but they do not any validation or data integrity checks. Thought they can be able to handle exceptions that cone from low level storage abstractions, like violation of UNIQUE constraint.
Data mappers can also perform transaction, but, if a single transaction needs to be performed for multiple domain object, you should be looking to add Unit of Work (more about it lower).
In more advanced/complicated cases data mappers can interact and utilize DAOs and query builders. But this more for situation, when you aim to create an ORM-like functionality.
Each domain object can have multiple mappers, but each mapper should work only with specific class of domain objects (or a subclass of one, if your code adheres to LSP). You also should recognize that domain object and a collection of domain object are two separate things and should have separate mappers.
Also, each domain object can contain other domain objects, just like each data mapper can contain other mappers. But in case of mappers it is much more a matter of preference (I dislike it vehemently).
Another improvement, that could alleviate your current mess, would be to prevent application logic from leaking in the presentation layer (most often - controller). Instead you would largely benefit from using services, that contain the interaction between mappers and domain objects, thus creating a public-ish API for your model layer.
Basically, services you encapsulate complete segments of your model, that can (in real world - with minor effort and adjustments) be reused in different applications. For example: Recognition, Mailer or DocumentLibrary would all services.
Also, I think I should not, that not all services have to contain domain object and mappers. A quite good example would be the previously mentioned Mailer, which could be used either directly by controller, or (what's more likely) by another service.
Level 2:
If you stop using the active record pattern, this become quite simple problem: you need to make sure, that you save only data from those domain objects, which have actually changed since last save.
As I see it, there are two way to approach this:
Quick'n'Dirty
If something changed, just update it all ...
The way, that I prefer is to introduce a checksum variable in the domain object, which holds a hash from all the domain object's variables (of course, with the exception of checksum it self).
Each time the mapper is asked to save a domain object, it calls a method isDirty() on this domain object, which checks, if data has changed. Then mapper can act accordingly. This also, with some adjustments, can be used for object graphs (if they are not to extensive, in which case you might need to refactor anyway).
Also, if your domain object actually gets mapped to several tables (or even different forms of storage), it might be reasonable to have several checksums, for each set of variables. Since mapper are already written for specific classes of domain object, it would not strengthen the existing coupling.
For PHP you will find some code examples in this ansewer.
Note: if your implementation is using DAOs to isolate domain objects from data mappers, then the logic of checksum based verification, would be moved to the DAO.
Unit of Work
This is the "industry standard" for your problem and there is a whole chapter (11th) dealing with it in PoEAA book.
The basic idea is this, you create an instance, that acts like controller (in classical, not in MVC sense of the word) between you domain objects and data mappers.
Each time you alter or remove a domain object, you inform the Unit of Work about it. Each time you load data in a domain object, you ask Unit of Work to perform that task.
There are two ways to tell Unit of Work about the changes:
caller registration: object that performs the change also informs the Unit of Work
object registration: the changed object (usually from setter) informs the Unit of Work, that it was altered
When all the interaction with domain object has been completed, you call commit() method on the Unit of Work. It then finds the necessary mappers and store stores all the altered domain objects.
Level 3:
At this stage of complexity the only viable implementation is to use Unit of Work. It also would be responsible for initiating and committing the SQL transactions (if you are using SQL database), with the appropriate rollback clauses.
P.S.
Read the "Patterns of Enterprise Application Architecture" book. It's what you desperately need. It also would correct the misconception about MVC and MVC-inspired design patters, that you have acquired by using Rails-like frameworks.

Entity framework and inheritance: NotSupportedException

I'm getting
System.NotSupportedException: All
objects in the EntitySet
'Entities.Message' must have unique
primary keys. However, an instance of
type 'Model.Message' and an instance
of type 'Model.Comment' both have the
same primary key value
but I have no idea what this means.
Using EF4, I have a bunch of entities of type Message. Some of these messages are actually a subtype, Comment, inheritance by table-per-type. Just
DB.Message.First();
will produce the exception. I have other instances of subtyping where I don't experience problems but I can't see any discrepencies. Sometimes, though, the problem goes away if I restart the development server, but not always.
Edit:
I've worked out (should have before) that the problem is a fault of the stored procedure fetching my Messages. The way this is currently set up as that all the fields pertaining to Message is fetched, the Comment table is ignored by the sproc. The context then proceeds to muck this up, probably by fetching those Messages that are also Comments again, as you suggested. How to do this properly is the central issue at hand. I've found some indications to a solution at http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb0bb421-ba8e-4b35-b7a7-950901adb602.
As you infer, it looks like the Context is fetching a Comment as a Message (not knowing that it is a comment). Later, you ask for the actual Comment, so the context fetches the Comment. Now you have two object instances in the Context with the same ID - one is a Message and one is a Comment.
It seems that the exception is not being thrown until after both objects have been loaded (ie when you try to access the Message the second time). If you can find a way to remove the Message from Context when the Comment is loaded, this may solve your problem.
Another option might be to use the Table-per-hierarchy model. This results in a bad database design but at the end of the day you have to use what works.
You might be able to avoid the problem by ensuring that the objects are loaded as Comments first. This way, when you ask for the Message, the Context already knows about it.
Also consider using Composition over Inheritance, such that a Message has 0..1 CommentDetails.
The final suggestion is to remove the dependency on the Entity Framework from your Control code, and create a Data Access Layer which references the EF and retrieves your objects. The DAL can turn Entity Framework objects into a different set of Entity objects which are easier to use in code. This approach will produce a lot of code overhead, but may be suitable if you cannot use the Entity Framework to produce an Entity model which represents your Entities in the way you want to work with them.
To summarize, unless MS fix this issue, there is no solution to your problem which does not involve a rethink of your approach. Unfortunately the Entity Framework is not ideal, especially for complex Entity models - you might be better off creating your own DAL and bypassing the EF altogether.
It sounds like you are pulling two records into memory one into message and one into comment.
Possible prblems:
There are two physical messages with the same id
The same message is being pulled up as a message and a comment
The same message is being pulled up twice into the same context
That the problem sometimes goes away when you restart, points to a problem with cleaning up of context. Are you using "using" statements.
Do you have functionality for changing from a message to a comment?
I am not an EF kind of guy (busy working with NHibernate, haven't had time to get up to date with EF yet) so I may be totally wrong here, but could the problem be that the two tables (since you are using inheritance by table-per-type) have primary keys that collide?
If you check the data in both tables, do primary key values collide?