How many database or table needed for Event sourcing? - cqrs

I'm trying to use event sourcing, ddd and cqrs.
I can't understand that I have to create two database (or table) ( 1-json 2-normalize database) or one database (just json)
And also if I have create two database (or table), I have to save data in databases (json and normalize) as atomic in one transaction or not?
Best regards

DDD
We're making an assumption here that you fully understand using DDD and the implications. Specifically related to Event Sourcing, it's a matter of defining Aggregate boundaries and the events that become their state.
CQRS
Again we're making an assumption that you fully understand the implications. CQRS merely allows you to write code in vertical slices (i.e. from UI to database) for handling "commands" separately from code that handles "queries". That's all. While it's true that you can then take this further, by storing data in a "read model" that might even be in a different database, let alone table, it's not a requirement of implementing CQRS.
As CQRS pertains to Event Sourcing - it's a good fit because the data model you tend to end up with in Event Sourcing is not conducive to complex queries. It's typically limited to "get the Aggregate by it's ID". Therefore having "projections" to store the data in other ways that are more appropriate for querying and loading into UIs is the typical approach.
Event Sourcing
If you implement a Domain Model in such a way that every command handled by an Aggregate (i.e. every use-case/task carried out by a user) generates one or more events, then Event Sourcing is the principle where you store those list of events in an append-only style against the Aggregate's ID, rather than storing a snapshot of the Aggregate after the command was successfully handled.
To load an aggregate from the event store, you load all it's previous events, and replay them in memory on the Aggregate object, again rather than loading a single row/document in as a snapshot/memento.
A document database is therefore an excellent choice for event stores, because a single document represents the event stream for a given Aggregate. However if you want to store your event streams in SQL, that's fine, but you might store it in two tables:
create table Aggregate (Id int not null...);
create table AggregateEvent(AggregateId int not null FK..., Version int not null, eventBody nvarchar(max));
The actual event body would typically be the event itself, serialised to a text format like JSON.
Projections and Read Stores
If you take the events generated by the handling of commands by aggregates, and write code that consumes them by writing to a separate data store (SQL, pre-calculated ViewModels, etc), then you can call that a "projection". It's "projecting" the data that's in one shape into another shape fit for a different purpose. The result is a "read store", which you can then query however you need to.

I can't understand that I have to create two database (or table) ( 1-json 2-normalize database) or one database (just json)
It's possible to get by with just an event store and nothing else.
"get by" isn't necessarily pleasant, however. Event stores are, as a rule, really good at "append new information", but not particularly good at "query". Thus, the usual answer is to deploy processes that copy information from your event store to something that has nicer query support.
I have to save data in databases (json and normalize) as atomic in one transaction or not?
It's a common pattern to update the event storage only, and then later invoke the process to copy the information from the event storage to your query support. Of course, that also means that your queries may end up showing old/out of date information (here is the answer to your question as-of five minutes ago).
If you store your query friendly data model with the event storage (tables in the same relational database, for instance), then you can arrange for at least some of your updates to the query friendly model to be synchronized with the events.
In other words, you get trade offs, not a single cookie cutter pattern that is used everywhere.

Related

DDD, Event Sourcing, and the shape of the Aggregate state

I'm having a hard time understanding the shape of the state that's derived applying that entity's events vs a projection of that entity's data.
Is an Aggregate's state ONLY used for determining whether or not a command can successfully be applied? Or should that state be usable in other ways?
An example - I have a Post entity for a standard blog post. I might have events like postCreated, postPublished, postUnpublished, etc. For my projections that I'll be persisting in my read tables, I need a projection for the base posts (which will include all posts, regardless of status, with lots of detail) as well as published_posts projection (which will only represent posts that are currently published with only the information necessary for rendering.
In the situation above, is my aggregate state ONLY supposed to be used to determine, for example, if a post can be published or unpublished, etc? If this is the case, is the shape of my state within the aggregate purely defined by what's required for these validations? For example, in my base post projection, I want to have a list of all users that have made a change to the post. In terms of validation for the aggregate/commands, I couldn't care less about the list of users that have made changes. Does that mean that this list should not be a part of my state within my aggregate?
TL;DR: yes - limit the "state" in the aggregate to that data that you choose to cache in support of data change.
In my aggregates, I distinguish two different ideas:
the history , aka the sequence of events that describes the changes in the lifetime of the aggregate
the cache, aka the data values we tuck away because querying the event history every time kind of sucks.
There's not a lot of value in caching results that we are never going to use.
One of the underlying lessons of CQRS is that we don't need aggregates everywhere
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003
If we aren't changing the data, then we can safely work directly with immutable copies of the data.
The only essential purpose of the aggregate is to determine what events, if any, need to be applied to bring the aggregate's state in line with a command (if the aggregate can be brought so in line). All state that's not needed for that purpose can be offloaded to a read-side, which can be thought of as a remix of the event stream (with each read-side only maintaining the state it needs).
That said, there are in practice, reasons to use the aggregate state directly, with the primary one being a desire for a stronger consistency for the aggregate: CQRS is inherently eventually consistent. As with all questions of consistent updates, it's important to recognize that consistency isn't free and very often isn't even cheap; I tend to think of a project as having a consistency budget and I'm pretty miserly about spending it.
In your case, there's probably no reason to include the list of users changing a post in the aggregate state, unless e.g. there's something like "no single user can modify a given post more than n times".

understanding Lagoms persistent read side

I read through the Lagom documentation, and already wrote a few small services that interact with each other. But because this is my first foray into CQRS i still have a few conceptual issues about the persistent read side that i don't really understand.
For instance, i have a user-service that keeps a list of users (as aggregates) and their profile data like email addresses, names, addresses, etc.
The questions i have now are
if i want to retrieve the users profile given a certain email-address, should i query the read side for the users id, and then query the event-store using this id for the profile data? or should the read side already keep all profile information?
If the read side has all information, what is the reason for the event-store? If its truly write-only, it's not really useful is it?
Should i design my system that i can use the event-store as much as possible or should i have a read side for everything? what are the scalability implications?
if the user-model changes (for instance, the profile now includes a description of the profile) and i use a read-side that contains all profile data, how do i update this read side in lagom to now also contain this description?
Following that question, should i keep different read-side tables for different fields of the profile instead of one table containing the whole profile
if a different service needs access to the data, should it always ask the user-service, or should it keep its own read side as needed? In case of the latter, doesn't that violate the CQRS principle that the service that owns the data should be the only one reading and writing that data?
As you can see, this whole concept hasn't really 'clicked' yet, and i am thankful for answers and/or some pointers.
if i want to retrieve the users profile given a certain email-address, should i query the read side for the users id, and then query the event-store using this id for the profile data? or should the read side already keep all profile information?
You should use a specially designed ReadModel for searching profiles using the email address. You should query the Event-store only to rehydrate the Aggregates, and you rehydrate the Aggregates only to send them commands, not queries. In CQRS an Aggregate may not be queried.
If the read side has all information, what is the reason for the event-store? If its truly write-only, it's not really useful is it?
The Event-store is the source of truth for the write side (Aggregates). It is used to rehydrate the Aggregates (they rebuild their internal & private state based on the previous emitted events) before the process commands and to persist the new events. So the Event-store is append-only but also used to read the event-stream (the events emitted by an Aggregate instance). The Event-store ensures that an Aggregate instance (that is, identified by a type and an ID) processes only a command at a time.
if the user-model changes (for instance, the profile now includes a description of the profile) and i use a read-side that contains all profile data, how do i update this read side in lagom to now also contain this description?
I don't use any other framework but my own but I guess that you rewrite (to use the new added field on the events) and rebuild the ReadModel.
Following that question, should i keep different read-side tables for different fields of the profile instead of one table containing the whole profile
You should have a separate ReadModel (with its own table(s)) for each use case. The ReadModel should be blazing fast, this means it should be as small as possible, only with the fields needed for that particular use case. This is very important, it is one of the main benefits of using CQRS.
if a different service needs access to the data, should it always ask the user-service, or should it keep its own read side as needed? In case of the latter, doesn't that violate the CQRS principle that the service that owns the data should be the only one reading and writing that data?
Here depends on you, the architect. It is preferred that each ReadModel owns its data, that is, it should subscribe to the right events, it should not depend on other ReadModels. But this leads to a lot of code duplication. In my experience I've seen a desire to have some canonical ReadModels that own some data but also can share it on demand. For this, in CQRS, there is also the term query. Just like commands and events, queries can travel in your system, but only from ReadModel to ReadModel.
Queries should not be sent during a client's request. They should be sent only in the background, as an asynchronous synchronization mechanism. This is an important aspect that influences the resilience and responsiveness of your system.
I've use also live queries, that are pushed from the authoritative ReadModels to the subscribed ReadModels in real time, when the answer changes.
In case of the latter, doesn't that violate the CQRS principle that the service that owns the data should be the only one reading and writing that data?
No, it does not. CQRS does not specify how the R (Read side) is updated, only that the R should not process commands and C should not be queried.

How to store sagas’ data?

From what I read aggregates must only contain properties which are used to protect their invariants.
I also read sagas can be aggregates which makes sense to me.
Now I modeled a registration process using a saga: on RegistrationStarted event it sends a ReserveEmail command which will trigger an EmailReserved or EmailReservationFailed given if the email is free or not. A listener will then either send a validation link or a message telling an account already exists.
I would like to use data from the RegistrationStarted event in this listener (say the IP and user-agent). How should I do it?
Storing these data in the saga? But they’re not used to protect invariants.
Pushing them through ReserveEmail command and the resulting event? Sounds tedious.
Project the saga to the read model? What about eventual consistency?
Another way?
Rinat Abdullin wrote a good overview of sagas / process managers.
The usual answer is that the saga has copies of the events that it cares about, and uses the information in those events to compute the command messages to send.
List[Command] processManager(List[Event] events)
Pushing them through ReserveEmail command and the resulting event?
Yes, that's the usual approach; we get a list [RegistrationStarted], and we use that to calculate the result [ReserveEmail]. Later on, we'll get [RegistrationStarted, EmailReserved], and we can use that to compute the next set of commands (if any).
Sounds tedious.
The data has to travel between the two capabilities somehow. So you are either copying the data from one message to another, or you are copying a correlation identifier from one message to another and then allowing the consumer to decide how to use the correlation identifier to fetch a copy of the data.
Storing these data in the saga? But they’re not used to protect invariants.
You are typically going to be storing events in the sagas (to keep track of what has happened). That gives you a copy of the data provided in the event. You don't have an invariant to protect because you are just caching a copy of a decision made somewhere else. You won't usually have the process manager running queries to collect additional data.
What about eventual consistency?
By their nature, sagas are always going to be "eventually consistent"; the "state" of an instance of a saga is just cached copies of data controlled elsewhere. The data is probably nanoseconds old by the time the saga sees it, there's no point in pretending that the data is "now".
If I understand correctly I could model my saga as a Registration aggregate storing all the events whose correlation identifier is its own identifier?
Udi Dahan, writing about CQRS:
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.

CQRS event store aggregate vs projection

In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
A projection is a view or representation of events so in the case of an aggregate representing a boundary that would make sense to me whereas if the aggregate contained the current summarized state I'd be confused about duplication between the two.
In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
Aggregates don't exist in the event store
Events live in the event store
Aggregates live in the write model (the C of CQRS)
Aggregate, in this case, still has the same basic meaning that it had in the "Blue Book"; it's the term for a boundary around one or more entities that are immediately consistent with each other. The responsibility of the aggregate is to ensure that writes (commands) to the book of record respect the business invariant.
It's typically convenient, in an event store, to organize the events into "streams"; if you imagine a RDBMS schema, the stream id will just be some identifier that says "these events are all part of the same history."
It will usually be the case that one aggregate -> one stream, but usually isn't always; there are some exceptional cases you may need to handle when you change your model. Greg Young is covering some of these in his new eBook on event versioning.
So it's possible that the same data structure might exist in the aggregate and query side store (duplicated view used for different purposes).
Yes, and no. It's absolutely the case that the data structures used when validating a write match those used to support a query. But the storage doesn't usually match. Put another way, aggregates don't get stored (the state of the aggregate does); whereas it is fairly common that the query view gets cached (again, not the data structure itself, but a representation that can be used to repopulate the data structure without necessarily needing to replay all of the events).
Any chance you have an example of aggregate state data structure (rdbms)? Every example I've found is trimmed down to a few columns with something like include id, source_id, version making it difficult to visualize what the scope of an aggregate is
A common example would be that of a trading book (an aggregate responsible for matching "buy" and "sell" orders).
In a traditional RDBMS store, that would probably look like a row in a books table, with a unique id for the book, information about what item that book is tracking, date information about when that book is active, and so on. In addition, there's likely to be some sort of orders table, with uniq ids, a trading book id, order type, transaction numbers, prices and volumes (in other words, all of the information the aggregate needs to know to satisfy its invariant).
In a document store, you'd see all of that information in a single document -- perhaps a json document with the information about the root object, and two lists of order objects (one for buys, one for sells).
In an event store, you'd see the individual OrderPlaced, TradeOccurred, OrderCancelled.
it seems that the aggregate is computed using the entire set of events unless it gets large enough to warrant a snapshot.
Yes, that's exactly right. If you are familiar with a "fold function", then event sourcing is just a fold from some common initial state. When a snapshot is available, we'll fold from that state (with a corresponding reduction in the number of events that get folded in)
In an event sourced environment with "snapshots", you might see a combination of the event store and the document store (where the document would include additional meta information indicating where in the event stream it had been assembled).

CQRS, Event-Sourcing and Web-Applications

As I am reading some CQRS resources, there is a recurrent point I do not catch. For instance, let's say a client emits a command. This command is integrated by the domain, so it can refresh its domain model (DM). On the other hand, the command is persisted in an Event-Store. That is the most common scenario.
1) When we say the DM is refreshed, I suppose data is persisted in the underlying database (if any). Am I right ? Otherwise, we would deal with a memory-transient model, which I suppose, would not be a good thing ? (state is not supposed to remain in memory on server side outside a client request).
2) If data is persisted, I suppose the read-model that relies on it is automatically updated, as each client that requests it generates a new "state/context" in the application (in case of a Web-Application or a RESTful architecture) ?
3) If the command is persisted, does that mean we deal with Event-Sourcing (by construct when we use CQRS) ? Does Event-Sourcing invalidate the database update process ? (as if state is reconstructed from the Event-Store, maintaining the database seems useless) ?
Does CQRS only apply to multi-databases systems (when data is propagated on separate databases), and, if it deals with memory-transient models, does that fit well with Web-Applications or RESTful services ?
1) As already said, the only things that are really stored are the events.
The only things that commands do are consistency checks prior to the raise of events. In pseudo-code:
public void BorrowBook(BorrowableBook dto){
if (dto is valid)
RaiseEvent(new BookBorrowedEvent(dto))
else
throw exception
}
public void Apply(BookBorrowedEvent evt) {
this.aProperty = evt.aProperty;
...
}
Current state is retrieved by sequential Apply. Since this, you have to point a great attention in the design phase cause there are common pitfalls to avoid (maybe you already read it, but let me suggest this article by Martin Fowler).
So far so good, but this is just Event Sourcing. CQRS come into play if you decide to use a different database to persist the state of an aggregate.
In my project we have a projection that every x minutes apply the new events (from event store) on the aggregate and save the results on a separate instance of MongoDB (presentation layer will access to this DB for reading). This model is clearly eventually consistent, but in this way you really separate Command (write) from Query (read).
2) If you have decided to divide the write model from the read model there are various options that you can use to make them synchronized:
Every x seconds apply events from the last checkpoint (some solutions offer snapshot to avoid reapplying of heavy commands)
A projection that subscribe events and update the read model as soon event is raised
3) The only thing stored are the events. Infact we have an event-store, not a command store :)
Is database is useless? Depends! How many events do you need to reapply for take the aggregate to the current state?
Three? Maybe you don't need to have a database for read-model
The thing to grok is that the ONLY thing stored is the events*. The domain model is rebuilt from the events.
So yes, the domain model is memory transient as you say in that no representation of the domain model is stored* only the events which happend to the domain to put the model in the current state.
When an element from the domain model is loaded what happens is a new instance of the element is created and then the events that affect that instance are replayed one after the other in the right order to put the element into the correct state.
you could keep instances of your domain objects around and subscribing to new events so that they can be kept up to date without loading them from all the events every time, but usually its quick enough just to load all the events from the database and apply them every time in the same way that you might load the instance from the database on every call to your web service.
*Unless you have snapshots of you domain object to reduce the number of events you need to load/process
Persistence of data is not strictly needed. It might be sufficient to have enough copies in enough different locations (GigaSpaces). So no, a database is not required. This is (at least was a few years ago) used in production by the Dutch eBay equivalent.