In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
A projection is a view or representation of events so in the case of an aggregate representing a boundary that would make sense to me whereas if the aggregate contained the current summarized state I'd be confused about duplication between the two.
In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
Aggregates don't exist in the event store
Events live in the event store
Aggregates live in the write model (the C of CQRS)
Aggregate, in this case, still has the same basic meaning that it had in the "Blue Book"; it's the term for a boundary around one or more entities that are immediately consistent with each other. The responsibility of the aggregate is to ensure that writes (commands) to the book of record respect the business invariant.
It's typically convenient, in an event store, to organize the events into "streams"; if you imagine a RDBMS schema, the stream id will just be some identifier that says "these events are all part of the same history."
It will usually be the case that one aggregate -> one stream, but usually isn't always; there are some exceptional cases you may need to handle when you change your model. Greg Young is covering some of these in his new eBook on event versioning.
So it's possible that the same data structure might exist in the aggregate and query side store (duplicated view used for different purposes).
Yes, and no. It's absolutely the case that the data structures used when validating a write match those used to support a query. But the storage doesn't usually match. Put another way, aggregates don't get stored (the state of the aggregate does); whereas it is fairly common that the query view gets cached (again, not the data structure itself, but a representation that can be used to repopulate the data structure without necessarily needing to replay all of the events).
Any chance you have an example of aggregate state data structure (rdbms)? Every example I've found is trimmed down to a few columns with something like include id, source_id, version making it difficult to visualize what the scope of an aggregate is
A common example would be that of a trading book (an aggregate responsible for matching "buy" and "sell" orders).
In a traditional RDBMS store, that would probably look like a row in a books table, with a unique id for the book, information about what item that book is tracking, date information about when that book is active, and so on. In addition, there's likely to be some sort of orders table, with uniq ids, a trading book id, order type, transaction numbers, prices and volumes (in other words, all of the information the aggregate needs to know to satisfy its invariant).
In a document store, you'd see all of that information in a single document -- perhaps a json document with the information about the root object, and two lists of order objects (one for buys, one for sells).
In an event store, you'd see the individual OrderPlaced, TradeOccurred, OrderCancelled.
it seems that the aggregate is computed using the entire set of events unless it gets large enough to warrant a snapshot.
Yes, that's exactly right. If you are familiar with a "fold function", then event sourcing is just a fold from some common initial state. When a snapshot is available, we'll fold from that state (with a corresponding reduction in the number of events that get folded in)
In an event sourced environment with "snapshots", you might see a combination of the event store and the document store (where the document would include additional meta information indicating where in the event stream it had been assembled).
Related
After reading dozens of articles and watching hours of videos, I don't seem to get an answer to a simple question:
Should static data be included in the events of the write/read models?
Let's take the oh-so-common "orders" example.
In all examples you'll likely see something like:
class OrderCreated(Event):
....
class LineAdded(Event):
itemID
itemCount
itemPrice
But in practice, you will also have lots of "static" data (products, locations, categories, vendors, etc).
For example, we have a STATIC products table, with their SKUs, description, etc. But in all examples, the STATIC data is never part of the event.
What I don't understand is this:
Command side: should the STATIC data be included in the event? If so, which data? Should the entire "product" record be included? But a product also has a category and a vendor. Should their data be in the event as well?
Query side: should the STATIC data be included in the model/view? Or can/should it be JOINED with the static table when an actual query is executed.
If static data is NOT part of the event, then the projector cannot add it to the read model, which implies that the query MUST use joins.
If static data IS part of the event, then let's say we change something in the products table (e.g. typo in the item description), this change will not be reflected in the read model.
So, what's the right approach to using static data with ES/CQRS?
Should static data be included in the events of the write/read models?
"It depends".
First thing to note is that ES/CQRS are a distraction from this question.
CQRS is simply the creation of two objects where there was previously only one. -- Greg Young
In other words, CQRS is a response to the idea that we want to make different trade offs when reading information out of a system than when writing information into the system.
Similarly, ES just means that the data model should be an append only sequence of immutable documents that describe changes of information.
Storing snapshots of your domain entities (be that a single document in a document store, or rows in a relational database, or whatever) has to solve the same problems with "static" data.
For data that is truly immutable (the ratio of a circle's circumference and diameter is the same today as it was a billion years ago), pretty much anything works.
When you are dealing with information that changes over time, you need to be aware of the fact that that the answer changes depending on when you ask it.
Consider:
Monday: we accept an order from a customer
Tuesday: we update the prices in the product catalog
Wednesday: we invoice the customer
Thursday: we update the prices in the product catalog
Friday: we print a report for this order
What price should appear in the report? Does the answer change if the revised prices went down rather than up?
Recommended reading: Helland 2015
Roughly, if you are going to need now's information later, then you need to either (a) write the information down now or (b) write down the information you'll need later to look up now's information (ex: id + timestamp).
Furthermore, in a distributed system, you'll need to think about the implications when part of the system is unavailable (ex: what happens if we are trying to invoice, but the product catalog is unavailable? can we cache the data ahead of time?)
Sometimes, this sort of thing can turn into a complete tangle until you discover that you are missing some domain concept (the invoice depends on a price from a quote, not the catalog price) or that you have your service boundaries drawn incorrectly (Udi Dahan talks about this often).
So the "easy" part of the answer is that you should expect time to be a concept you model in your solution. After that, it gets context sensitive very quickly, and discovering the "right" answer may involve investigating subtle questions.
I'm trying to use event sourcing, ddd and cqrs.
I can't understand that I have to create two database (or table) ( 1-json 2-normalize database) or one database (just json)
And also if I have create two database (or table), I have to save data in databases (json and normalize) as atomic in one transaction or not?
Best regards
DDD
We're making an assumption here that you fully understand using DDD and the implications. Specifically related to Event Sourcing, it's a matter of defining Aggregate boundaries and the events that become their state.
CQRS
Again we're making an assumption that you fully understand the implications. CQRS merely allows you to write code in vertical slices (i.e. from UI to database) for handling "commands" separately from code that handles "queries". That's all. While it's true that you can then take this further, by storing data in a "read model" that might even be in a different database, let alone table, it's not a requirement of implementing CQRS.
As CQRS pertains to Event Sourcing - it's a good fit because the data model you tend to end up with in Event Sourcing is not conducive to complex queries. It's typically limited to "get the Aggregate by it's ID". Therefore having "projections" to store the data in other ways that are more appropriate for querying and loading into UIs is the typical approach.
Event Sourcing
If you implement a Domain Model in such a way that every command handled by an Aggregate (i.e. every use-case/task carried out by a user) generates one or more events, then Event Sourcing is the principle where you store those list of events in an append-only style against the Aggregate's ID, rather than storing a snapshot of the Aggregate after the command was successfully handled.
To load an aggregate from the event store, you load all it's previous events, and replay them in memory on the Aggregate object, again rather than loading a single row/document in as a snapshot/memento.
A document database is therefore an excellent choice for event stores, because a single document represents the event stream for a given Aggregate. However if you want to store your event streams in SQL, that's fine, but you might store it in two tables:
create table Aggregate (Id int not null...);
create table AggregateEvent(AggregateId int not null FK..., Version int not null, eventBody nvarchar(max));
The actual event body would typically be the event itself, serialised to a text format like JSON.
Projections and Read Stores
If you take the events generated by the handling of commands by aggregates, and write code that consumes them by writing to a separate data store (SQL, pre-calculated ViewModels, etc), then you can call that a "projection". It's "projecting" the data that's in one shape into another shape fit for a different purpose. The result is a "read store", which you can then query however you need to.
I can't understand that I have to create two database (or table) ( 1-json 2-normalize database) or one database (just json)
It's possible to get by with just an event store and nothing else.
"get by" isn't necessarily pleasant, however. Event stores are, as a rule, really good at "append new information", but not particularly good at "query". Thus, the usual answer is to deploy processes that copy information from your event store to something that has nicer query support.
I have to save data in databases (json and normalize) as atomic in one transaction or not?
It's a common pattern to update the event storage only, and then later invoke the process to copy the information from the event storage to your query support. Of course, that also means that your queries may end up showing old/out of date information (here is the answer to your question as-of five minutes ago).
If you store your query friendly data model with the event storage (tables in the same relational database, for instance), then you can arrange for at least some of your updates to the query friendly model to be synchronized with the events.
In other words, you get trade offs, not a single cookie cutter pattern that is used everywhere.
I'm having a hard time understanding the shape of the state that's derived applying that entity's events vs a projection of that entity's data.
Is an Aggregate's state ONLY used for determining whether or not a command can successfully be applied? Or should that state be usable in other ways?
An example - I have a Post entity for a standard blog post. I might have events like postCreated, postPublished, postUnpublished, etc. For my projections that I'll be persisting in my read tables, I need a projection for the base posts (which will include all posts, regardless of status, with lots of detail) as well as published_posts projection (which will only represent posts that are currently published with only the information necessary for rendering.
In the situation above, is my aggregate state ONLY supposed to be used to determine, for example, if a post can be published or unpublished, etc? If this is the case, is the shape of my state within the aggregate purely defined by what's required for these validations? For example, in my base post projection, I want to have a list of all users that have made a change to the post. In terms of validation for the aggregate/commands, I couldn't care less about the list of users that have made changes. Does that mean that this list should not be a part of my state within my aggregate?
TL;DR: yes - limit the "state" in the aggregate to that data that you choose to cache in support of data change.
In my aggregates, I distinguish two different ideas:
the history , aka the sequence of events that describes the changes in the lifetime of the aggregate
the cache, aka the data values we tuck away because querying the event history every time kind of sucks.
There's not a lot of value in caching results that we are never going to use.
One of the underlying lessons of CQRS is that we don't need aggregates everywhere
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003
If we aren't changing the data, then we can safely work directly with immutable copies of the data.
The only essential purpose of the aggregate is to determine what events, if any, need to be applied to bring the aggregate's state in line with a command (if the aggregate can be brought so in line). All state that's not needed for that purpose can be offloaded to a read-side, which can be thought of as a remix of the event stream (with each read-side only maintaining the state it needs).
That said, there are in practice, reasons to use the aggregate state directly, with the primary one being a desire for a stronger consistency for the aggregate: CQRS is inherently eventually consistent. As with all questions of consistent updates, it's important to recognize that consistency isn't free and very often isn't even cheap; I tend to think of a project as having a consistency budget and I'm pretty miserly about spending it.
In your case, there's probably no reason to include the list of users changing a post in the aggregate state, unless e.g. there's something like "no single user can modify a given post more than n times".
I'm having some issues to correctly design the domain that I'm working on.
My straightforward use case is the following:
The user (~5000 users) can access to a list of ads (~5 millions)
He can choose to add/remove some of them as favorites.
He can decide to show/hide some of them.
I have a command which will mutate the aggregate state, to set Favorite to TRUE, let's say.
In terms of DDD, how should I design the aggregates?
How design the relationship between a user and his favorite's ads selection?
Considering the large numbers of ads, I cannot duplicate each ad inside a user aggregate root.
Can I design a Ads aggregateRoot containing a user "collection".
And finally, how to handle/perform the readmodels part?
Thanks in advance
Cheers
Two concepts may help you understand how to model this:
1. Aggregates are Transaction Boundaries.
An aggregate is a cluster of associated objects that are considered as a single unit. All parts of the aggregate are loaded and persisted together.
If you have an aggregate that encloses a 1000 entities, then you have to load all of them into memory. So it follows that you should preferably have small aggregates whenever possible.
2. Aggregates are Distinct Concepts.
An Aggregate represents a distinct concept in the domain. Behavior associated with more than one Aggregate (like Favoriting, in your case) is usually an aggregate by itself with its own set of attributes, domain objects, and behavior.
From your example, User is a clear aggregate.
An Ad has a distinct concept associated with it in the domain, so it is an aggregate too. There may be other entities that will be embedded within the Ad like valid_until, description, is_active, etc.
The concept of a favoriting an Ad links the User and the Ad aggregates. Your question seems to be centered around where this linkage should be preserved. Should it be in the User aggregate (a list of Ads), or should an Ad have a collection of User objects embedded within it?
While both are possibilities, IMHO, I think FavoriteAd is yet another aggregate, which holds references to both the User aggregate and the Ad aggregate. This way, you don't burden the concepts of User or the Ad with favoriting behavior.
Those aggregates will also not be required to load this additional data every time they are loaded into memory. For example, if you are loading an Ad object to edit its contents, you don't want the favorites collection to be loaded into memory by default.
These aggregate structures don't matter as far as read models are concerned. Aggregates only deal with the write side of the domain. You are free to rewire the data any way you want, in multiple forms, on the read side. You can have a subscriber just to listen to the Favorited event (raised after processing the Favorite command) and build a composite data structure containing data from both the User and the Ad aggregates.
I really like the answer given by Subhash Bhushan and I want to add another approach for you to consider.
If you look closely at your question you will see that you've made the assumption that an aggregate can 'see' everything that the user does when they are interacting with the UI. This doesn't need to be so.
Depending on the requirements of the domain you don't need to hold a list of any Ads in the aggregate to favourite them. Here's what I mean:
For this example, it doesn't matter where the the 'favourite' ad command sits. It could be on the user aggregate or a specific aggregate for handling the concept of Favouriting. The command just needs to hold the id of the User and the Ad they are favouriting.
You may need to handle what happens if a user or ad is deleted but that would just be a case of an event process manager listening to the appropriate events and issuing compensating commands.
This way you don't need to load up 5 million ads. That's a job for the read model and UI, not the domain.
Just a thought.
Let's say I have an object called document and it has bunch of children in form of images, audio, video etc. So a user of my application can create a document by typing some text, adding image, video, etc. From what I understand in DDD, document is an aggregate, while images, videos are always associated with a document as root. Based on this understanding, how would I design an app that enables a user a create/edit document? I could have a REST endpoint to upload document and all it's children in one request, but that's potentially long-running operation. Alternatively, I could design 2 rest endpoint, one to upload document's text body and call the other repeatedly to upload its children, which essentially means multiple transactions. Is the second approach still DDD? Am I violating transaction boundary by splitting document creation and update into multiple requests?
Consistency boundaries (I prefer that term over "transaction boundaries") are not a concept that specify the granularity of allowed changes. They tell you what can be changed atomically, and what cannot.
For example, if you design your documents to be separate aggregates than the images, then you should not change both the document and the and image in one user operation (even when that's technically possible). This means that aggregates cannot be too small, because that would be overly restrictive for a user. They should however also not be too big, because only one user can change an aggregate at a time, so larger aggregates tend to produce more conflicts.
You should try to design aggregates as small as possible, but still large enough to support your use cases. Thus, you'll have to figure that out yourself for your application with the rules above.
So both approaches that you mention are valid from a DDD point of view.