I'm on a crusade to learn CQRS (ala Greg Young). Currently reading CQRS The Example by Mark Nijhof and working with his example for the book. The first thing i'm starting to get confused with is domain events and replaying the events to arrive at any state of any object at any point. As Mark has stated in his book only the final state of a domain object should be recorded because of changing business logic. Buisness logic can change and you might not arrive at the same answer and also you don't want to kick out external domain events. So your essentially decoupling business logic & the actually data changes their self. So you essentially serialize the event object (which is essentially a DTO) into a datastore.
I can see this working really well if your data/object schema never changes but what do you do for object/data schema changes? So for example a really simple contrived example would be an address. Say when you first developed your app it was localized to the united states. You decide at some point to internationalize your app. You add a "Country" field and make the field required. So if your working with a static language (like Java or C#) as soon as you try to deserialize an old object for event replay then it will blow up. The only way i can see getting around this would be to either store different versions of your objects (seems to be very messy) or store your events as something more unstructured (like xml). Of course this would probably work really well with a dynamic language. I guess in c# you could probably use the DLR (dyanmic) but i think this could get messy too. Is there any other way?
The thing to remember is that what has passed is history. Event are history. Just coz your current view of the world has changed, doesn't mean something that happened in some form in the past didn't happen.
In terms of storage, yes, some form of serialised storage is usually handy. In fact, have loose type-tags that let you interpret the serialised events (rather than straight deserialisation) can be quite helpful when it comes to versioning. As with any form of communication (in this case, your ability to communicate across verions), remember Postel's Law: http://en.wikipedia.org/wiki/Robustness_principle . Favour the tolerant reader approach. In other words, consider your past events when hydrating. If an event seems to change semantic meaning, create a different event. Commands only act on the present state of the aggregate. The aggregate hydrates itself from its previous events. As such, your current version gets to choose how it will interpret past events (perhaps taking on a default value for past events - ala any event not having country code would correspond to UK, or something).
Lastly, only the state necessary to act on commands should be part of aggregates. They're not meant for querying purposes. They just need to emit enough information in the form of events so that downstream consumers can build the data they need (possibly correlating data from multiple streams / sources).
Does that help?
Events are history. They are persisted data, just like a file on disk, or column values in a database. You have to deal with upgrades and schema changes. There are a variety of methods to deal with this. In your particular example, you'd clearly need to make Country an optional field, and figure out what to do with the old events, for instance assuming they were all in the USA.
The only way i can see getting around this would be to either store different versions of your objects (seems to be very messy) or store your events as something more unstructured (like xml). Of course this would probably work really well with a dynamic language. I guess in c# you could probably use the DLR (dyanmic) but i think this could get messy too. Is there any other way?
You can and should version or events, that is how we get away with those changes. You would end up with two versions of the same event:
public class AddressAdded : IEvent
{
public string Street {get; set;}
public string Number {get; set;}
public string PostCode {get; set;}
public virtual string Region { get; set; }
}
public class AddressAddedV1 : AddressAdded
{
[Obsolete]
public override string Region { get; set; }
}
A new version of the event might add or remove properties, so you can use the "Obsolete" attribute to make sure old properties are not used anymore.
Related
After reading dozens of articles and watching hours of videos, I don't seem to get an answer to a simple question:
Should static data be included in the events of the write/read models?
Let's take the oh-so-common "orders" example.
In all examples you'll likely see something like:
class OrderCreated(Event):
....
class LineAdded(Event):
itemID
itemCount
itemPrice
But in practice, you will also have lots of "static" data (products, locations, categories, vendors, etc).
For example, we have a STATIC products table, with their SKUs, description, etc. But in all examples, the STATIC data is never part of the event.
What I don't understand is this:
Command side: should the STATIC data be included in the event? If so, which data? Should the entire "product" record be included? But a product also has a category and a vendor. Should their data be in the event as well?
Query side: should the STATIC data be included in the model/view? Or can/should it be JOINED with the static table when an actual query is executed.
If static data is NOT part of the event, then the projector cannot add it to the read model, which implies that the query MUST use joins.
If static data IS part of the event, then let's say we change something in the products table (e.g. typo in the item description), this change will not be reflected in the read model.
So, what's the right approach to using static data with ES/CQRS?
Should static data be included in the events of the write/read models?
"It depends".
First thing to note is that ES/CQRS are a distraction from this question.
CQRS is simply the creation of two objects where there was previously only one. -- Greg Young
In other words, CQRS is a response to the idea that we want to make different trade offs when reading information out of a system than when writing information into the system.
Similarly, ES just means that the data model should be an append only sequence of immutable documents that describe changes of information.
Storing snapshots of your domain entities (be that a single document in a document store, or rows in a relational database, or whatever) has to solve the same problems with "static" data.
For data that is truly immutable (the ratio of a circle's circumference and diameter is the same today as it was a billion years ago), pretty much anything works.
When you are dealing with information that changes over time, you need to be aware of the fact that that the answer changes depending on when you ask it.
Consider:
Monday: we accept an order from a customer
Tuesday: we update the prices in the product catalog
Wednesday: we invoice the customer
Thursday: we update the prices in the product catalog
Friday: we print a report for this order
What price should appear in the report? Does the answer change if the revised prices went down rather than up?
Recommended reading: Helland 2015
Roughly, if you are going to need now's information later, then you need to either (a) write the information down now or (b) write down the information you'll need later to look up now's information (ex: id + timestamp).
Furthermore, in a distributed system, you'll need to think about the implications when part of the system is unavailable (ex: what happens if we are trying to invoice, but the product catalog is unavailable? can we cache the data ahead of time?)
Sometimes, this sort of thing can turn into a complete tangle until you discover that you are missing some domain concept (the invoice depends on a price from a quote, not the catalog price) or that you have your service boundaries drawn incorrectly (Udi Dahan talks about this often).
So the "easy" part of the answer is that you should expect time to be a concept you model in your solution. After that, it gets context sensitive very quickly, and discovering the "right" answer may involve investigating subtle questions.
Can I construct a value object in the event handler or should I pass the parameters to the aggregate to construct the value object itself? Seller is the aggregate and offer is the value object. Will it be better for the aggregate to pass the value object in the event?
public async Task HandleAsync(OfferCreatedEvent domainEvent)
{
var seller = await this.sellerRepository.GetByIdAsync(domainEvent.SellerId);
var offer = new Offer(domainEvent.BuyerId, domainEvent.ProductId, seller.Id);
seller.AddOffer(offer);
}
should I pass the parameters to the aggregate to construct the value object itself?
You should probably default to passing the assembled value object to the domain entity / root entity.
The supporting argument is that we want to avoid polluting our domain logic with plumbing concerns. Expressed another way, new is not a domain concept, so we'd like that expression to live "somewhere else".
Note: that by passing the value to the domain logic, you protect that logic from changes to the construction of the values; for instance, how much code has to change if you later discover that there should be a fourth constructor argument?
That said, I'd consider this to be a guideline - in cases where you discover that violating the guideline offers significant benefits, you should violate the guideline without guilt.
Will it be better for the aggregate to pass the value object in the event?
Maybe? Let's try a little bit of refactoring....
// WARNING: untested code ahead
public async Task HandleAsync(OfferCreatedEvent domainEvent)
{
var seller = await this.sellerRepository.GetByIdAsync(domainEvent.SellerId);
Handle(domainEvent, seller);
}
static Handle(OfferCreatedEvent domainEvent, Seller seller)
{
var offer = new Offer(domainEvent.BuyerId, domainEvent.ProductId, seller.Id);
seller.AddOffer(offer);
}
Note the shift - where HandleAsync needs to be aware of async/await constructs, Handle is just a single threaded procedure that manipulates two local memory references. What that procedure does is copy information from the OfferCreatedEvent to the Seller entity.
The fact that Handle here can be static, and has no dependencies on the async shell, suggests that it could be moved to another place; another hint being that the implementation of Handle requires a dependency (Offer) that is absent from HandleAsync.
Now, within Handle, what we are "really" doing is copying information from OfferCreatedEvent to Seller. We might reasonably choose:
seller.AddOffer(domainEvent);
seller.AddOffer(domainEvent.offer());
seller.AddOffer(new Offer(domainEvent));
seller.AddOffer(new Offer(domainEvent.BuyerId, domainEvent.ProductId, seller.Id));
seller.AddOffer(domainEvent.BuyerId, domainEvent.ProductId, seller.Id);
These are all "fine" in the sense that we can get the machine to do the right thing using any of them. The tradeoffs are largely related to where we want to work with the information in detail, and where we prefer to work with the information as an abstraction.
In the common case, I would expect that we'd use abstractions for our domain logic (therefore: Seller.AddOffer(Offer)) and keep the details of how the information is copied "somewhere else".
The OfferCreatedEvent -> Offer function can sensibly live in a number of different places, depending on which parts of the design we think are most stable, how much generality we can justify, and so on.
Sometimes, you have to do a bit of war gaming: which design is going to be easiest to adapt if the most likely requirements change happens?
I would also advocate for passing an already assembled value object to the aggregate in this situation. In addition to the reasons already mentioned by #VoiceOfUnreason, this also fits more naturally with the domain language. Also, when reading code and method APIs you can then focus on domain concepts (like an offer) without being distracted by details until you really need to know them.
This becomes even more important if you would need to pass in more then one value object (or entity). Rather passing in all the values required for construction as parameters not only makes the API more resilient to refactoring but also burdens the reader with more details.
The seller is receiving an offer.
Assuming this is what is meant here, fits better than something like the following:
The seller receives some buyer id, product id, etc.
This most probably would not be found in conversations using the ubiquitous language. In my opinion code should be as readable as possible and express the behaviour and business logic as close to human language as possible. Because you compile code for machines to execute it but the way you write it is for humans to easily understand it.
Note: I would even consider using factory methods on value objects in certain cases to unburden the client code of knowing what else might be needed to assemble a valid value object, for instance, if there are different valid constellations and ways of constructing the same value objects where some values need reasonable default values or values are chosen by the value object itself. In more complex situations a separate factory might even make sense.
There is one thing about CQRS I do not get: How to update the read model when the raised event does not contain the details needed for updating the read model.
Unfortunately, this is a quite common scenario.
Example: I add a user to a group, so I send a addUserToGroup(userId, groupId) command. This is received, handled by the command handler, the userAddedToGroup event is created, stored and published.
Now, an event handler receives this event and the both IDs. Now there shall be a view that lists all users with the names of the groups they're in. To update the read model for that view, we do need the user id (which we have) and the group name (which we don't have, we only have its id).
So the question is: How do I handle this scenario?
Currently, four options come to my mind, all with their specific disadvantages:
The read model asks the domain. => Forbidden, and not even possible, as the domain only has behavior, no (public) state.
The read model reads the group name from another table in the read model. => Works, but what if there is no matching table?
Add the neccessary data to the event. => Does not work, as this means that I had to update all previous events as well, and I cannot foresee which data I may need one day.
Do not handle the event via a "usual" event handler, but start an ETL process in the background that deals with the event store, creates the neccessary data and writes the read model. => Works, but to me this seems a little bit of way too much overhead for such a simple scenario.
So, the question is: How do I deal with this scenario correctly?
There are two common solutions.
1) "Event Enrichment" is where you indeed put information on the event that reflects the information you are mentioning, e.g. the group name. Doing this is somewhere between modeling your domain properly and cheating. If you know, for instance, that group names change, emitting the name at the moment of the change is not a bad idea. Imagine when you create a line item on a quote or invoice, you want to emit the price of the good sold on the invoice created event. This is because you must honor that price, even if it changes later.
2) Project several streams at once. Write a projector which watches information from the various streams and joins them together. You might watch user and group events as well as your user added to group event. Depending on the ordering of events in your system, you may know that a user is in a group before you know the name of the group, but you should know the general properties of your event store before you get going.
Events don't necessarily represent a one-to-one mapping of the commands that have initiated the process in the first place. For instance, if you have a command:
SubmitPurchaseOrder
Shopping Cart Id
Shipping Address
Billing Address
The resulting event might look like the following:
PurchaseOrderSubmitted
Items (Id, Name, Amount, Price)
Shipping Address
Shipping Provider
Our Shipping Cost
Shipping Cost billed to Customer
Billing Address
VAT %
VAT Amount
First Time Customer
...
Usually the information is available to the domain model (either by being provided by the command or as being known internal state of the concerned aggregate or by being calculated as part of processing.)
Additionally the event can be enriched by querying the read model or even a different BC (e.g. to retrieve the actual VAT % depending on state) during processing.
You're correctly assuming that events can (and probably will) change over time. This basically doesn't matter at all if you employ versioning: Add the new event (e.g. SubmitPurchaseOrderV2) and add an appropriate event handler to all the classes that are supposed to consume it. No need to change the old event, it can still be consumed since you don't modify the interface, you extend it. This basically comes down to a very good example of the Open/Closed Principle in practice.
Option 2 would be fine, your question about "what about the mismatching in the groups' name read-model table" wouldn´t apply. no data should be deleted, should invalidated when a previous event (say delete group) was emmited. In the end the row in the groups table is there effectively and you can read the group name without problem at all. The only apparent problem could be speed inconsistency, but thats another issue, events should be orderly processed no matter speed they are being processed.
I understand the difference between commands and events but in a lot of cases you end up with redundancy and mapping between 2 classes that are essentially the same (ThingNameUpdateCommand, ThingNameUpdatedEvent). For these simple cases can you / do you use the event also as a command? Do people serialise to a store all commands as well as all events? Just seems to be a little redundant to me.
All lot of this redundancy is for a reason in general and you want to avoid using the same message for two different purposes for a number of reasons:
Sourced events must be versioned when they change since they are stored and re-used (deserialized) when you hydrate an aggregate root. It will make things a bit awkward if the class is also being used as a message.
Coupling is increased, the same class is now being used by command handlers, the domain model and event handlers now. De-coupling the command side from the event can simplify life for you down the road.
Finally clarity. Commands are issued in a language that asks something to be done (imperative generally). Events are representations of what has happened (past-tense generally). This language gets muddled if you use the same class for both.
In the end these are just data classes, it isn't like this is "hard" code. There are ways to actually avoid some of the typing for simple scenarios like code-gen. For example, I know Greg has used XML and XSD transforms to create all the classes needed for a given domain in the past.
I'd say for a lot of simple cases you may want to question if this is really domain (i.e. modeling behavior) or just data. If it is just data consider not using event sourcing here. Below is a link to a talk by Udi Dahan about breaking up your domain model so that not all of it requires event-sourcing. I'm kind of in line with this way of thinking now myself.
http://skillsmatter.com/podcast/design-architecture/talk-from-udi-dahan
After working through some examples and especially the Greg Young presentation (http://www.youtube.com/watch?v=JHGkaShoyNs) I've come to the conclusion that commands are redundant. They are simply events from your user, they did press that button. You should store these in exactly the same way as other events because it is data you don't know if you will want to use it in a future view. Your user did add and then later remove that item from the basket or at least attempt to. You may later want to use this information to remind the user of this at later date.
Assume that Book and Author are Aggregate Roots in my model.
In read model i have table AuthorsAndBooks which is a list of Authors and Books joined by Book.AuthorId
When BookAdded event is fired i want to receive Author data to create a new AuthorsAndBooks line.
Because Book is an Aggregate Root, information about Author doesn't included in BookAdded event. And i cannot include it because Author root doesn't have getters (according to guidelines of all examples and posts about CQRS and Event Sourcing).
Usually i receive two types of answers on this question:
Enrich your domain event with all data you need in event handlers. But as i said i cannot do it for Aggregates Roots.
Use available data from View Model. I.e. load Author from View Model and use it to build AuthorsAndBooks row.
The last one has some problems with concurrency. Author data can be not available in View Model at the time BookAdded event is handling.
What approach do you use to solve this? Thank you.
As a general advice, let the event handlers be idempotent and make sure you can deal with out of order message handling (either by re-queuing or building in mechanisms to fill in missing data).
On the other hand, do question why author and book are such desperate aggregate roots. Maybe you should copy from the author upon adding a book (what the f* is "adding a book", how's that a command). The problem is all these made-up examples. Descend to the real world, I doubt your problem exists.
Your question is missing some context, for example what is the user scenario that leads to this event and what is the state you are starting from? If you were writing the BDD tests for this case, what would they look like? Knowing this would help a lot in answering your question.
How you solve the problem of relating an book to an author is domain dependent. First we are assuming it makes sense for your domain to have an aggregate for Author and an aggregate for Book, for example, if I was writing a library system, I doubt I would have an aggregate for authors, since I don't care about an author without his/her book, what I care about is books.
As for the lack of getters, it's worth mentioning that aggregate roots don't have getters because of a preference for a tell-don't-ask style of OOP. However you can tell one AR to do something which then then tells something to another AR if you need. Part of what is important is the AR tells the others about itself rather than writing code where you ask it and then pass it along.
Finally, I have to ask why you don't have the author's ID at the time you are adding the book? How would you even know who the author is then? I would assume you could just do the following (my code assumes you are using a fluent interface for creation of AR, but you can substitute factories, constructors, whatever you use):
CreateNew.Book()
.ForAuthor(command.AuthorId)
.WithContent(command.Content);
Now perhaps the scenario is you are adding a book along with a brand new author. I would either handle this as two separate commands (which may make more sense for your domain), or handle the command the following way:
var author = CreateNew.Author()
.WithName(command.AuthorName);
var book = CreateNew.Book()
.ForAuthor(author.Id)
.WithContent(command.Content);
Perhaps the problem is you have no getter on the aggregate root Id, which I don't believe is necessary or common. However, assuming Id encapsulation is important to you, or your BookAdded event needs more information about the author than the Id along can provide, then you could do something like this:
var author = CreateNew.Author()
.WithName(command.AuthorName);
var book = author.AddBook(command.Content);
// Adds a new book belonging to this Author
public Book AddBook(BookContent content) {
var book = CreateNew.Book()
.ForAuthor(this.Id)
.WithContent(command.Content);
}
Here we are telling the author to add a book, at which point it creates the aggregate root for the book and passes it's Id to the book. Then we can have the event BookAddedForAuthor which will have the id of the author.
The last one has downsides though, it creates a command that must act through multiple aggregate roots. As much as possible I would try to figure out why the first example isn't working for you.
Also, I can't stress enough how the implementation you are looking for is dictated by your specific domain context.
IMHO, populate read model from author/book events, using reordering to handle cases, where events get out of order (view handler is within it's own consistency boundary and should handle ordering/deduplication cases anyway).
The first thing I would ask is why there are concurrency issues in the read model. If the client is sending a reference to the author aggregate inside the AddBook command, where did it get the information from? If the book and author are created at the same time, then your event can probably be enriched. Let me know if I'm missing something here.
The last one has some problems with
concurrency. Author data can be not
available in View Model at the time
BookAdded event is handling.
What about "handling the event later"? So you simply put it to the back of the queue until this data is available (maybe with a limit of x tries and x time between each try).