Is there a hard definition of how CQRS should be applied and CQRS questions - cqrs

I have some trouble understanding what the CQRS pattern really is, on its core, meaning what are the red lines that, when crossed, we are not implementing the CQRS pattern.
I clearly understand the CQS pattern.
Suppose that we implement microservices with CQRS, without event sourcing.
1) First question is, does the CQRS pattern only apply to the client I/O? Meaning, hoping I get this right, that for example the client updates using controllers that write to database A, but read by querying controllers that write to database B, (B is eventually updated and may be aggregating information from multiple models using events sent by controller of A).
Or, this is not about the client, but anything, for example another microservice reading / writing? And if the latter, what is really the borderline that defines who is the reader / writer that causes the segregation?
Does this maybe have to do with the domains in DDD?
This is an important question in my mind, because without it, CQRS is just an interdependence, of model B being updated by model A, but not the reverse. And why wouldn't this be propagated from a model B to a model C for example?
I have also read articles stating that some people implement CQRS by having one Command Service and one Query Service, which even more complicates this.
2) Similar to the first question, why do some references speak of events as if they are the Commands of CQRS? This complicates CQRS in my mind, because, technically, with one request event we can ask a service "Hey please give me the information X" and the service can respond with an event that contains the payload, effectively doing a query. Is this a hard rule, or just an example to state that, we can update using events and we can query using REST?
3) What if, in most cases I write to model A, and I read from model B, but in some cases I read directly from model A?
Am I breaking CQRS?
What if my querying needs are very simple, should I duplicate model A in this situation?
4) What if, as stated in question 1), I read from model A to emit events, to produce model B, but then I like to read some information from model B because it's valuable because it is aggregated, in order to produce model C?
Am I breaking CQRS?
What is the controller that populates model B doing in that case, e.g. if it also emits events to populate model C? Is it simply a Command anyway because it is not the part that queries, so we still apply CQRS?
Additionally, what if, we read from model A to emit events, to produce model B, but while we produce model B, we emit events, to send client notifications. Is that still CQRS?
5) When is CQRS broken?
If I have a controller that reads from model B, but also emits a message that updates model A, is that it?
Finally, if that controller, e.g. a REST controller, reads from model B and simultaneously emits a message to update model A, but without containing any information from what was read from model B, (so the operation is two in one, but it does not use information from B to update A), is that still CQRS?
And, what if a REST controller, that updates model A, also returns some information to the client, that has been read from A, does that break CQRS? What if this is just an id? And what if the id is not read from A, but it is just a reference number that is randomly generated? Is there a problem in that case because the REST controller updates, but also returns something to the user?
I will really appreciate your patience for replying as it can be seen that I'm still quite confused and that I'm in the process of learning!

Is there a hard definition of how CQRS should be applied and CQRS questions
Yes, start with Greg Young.
CQRS is simply the creation of two objects where there was previously only one. The separation occurs based upon whether the methods are a command or a query (the same definition that is used by Meyer in Command and Query Separation, a command is any method that mutates state and a query is any method that returns a value). -- Greg Young 2010
It's "just a pattern", born of the fact that data representations which are effective for queries are frequently not the patterns that are effective for tracking change. For example, using an RDBMS for storing business data may be our best choice for maintaining data integrity, but for certain kinds of queries we might want to use a replicate of that data in a graph database.
why do some references speak of events as if they are the Commands of CQRS
HandleEvent is a command. CommandReceived is an event. It's very easy for readers (and authors!) to confuse the contexts that are being described. They are all "just" messages, the semantics of one or the other really depend on the direction the message is traveling relative to the authority for the information in the message.
For example, if I fill out a form on an e-commerce site and submit; is the corresponding message an OrderSubmitted event? or is it a PlaceOrder command? Either spelling could be the correct one, depending on how you choose to model the ordering process.
What if, in most cases I write to model A, and I read from model B, but in some cases I read directly from model A? Am I breaking CQRS?
The CQRS police are not going to come after you if you read from write models. In many architectures, the business logic is executed in a stateless component, and will depend on reading the "current" state from some storage appliance -- in other words, to support write often requires a read.
Pessimizing a write model to support read use cases is the thing we are trying to avoid.
Also: horses for courses. It's entirely appropriate to restrict the use of CQRS to those cases where you can profit from it. When GET/PUT semantics of a single model work, you should prefer them to separate models for reads and writes.

Related

Designing event-based architecture for the customer service

Being a developer with solid experience, i am only entering the world of microservices and event-driven architecture. Things like loose coupling, independent scalability and proper implementation of asynchronous business processes is something that i feel should get simplified as compared with traditional monolith approach. So giving it a try, making a simple PoC for myself.
I am considering making a simple application where user can register, login and change the customer details. However, i want to react on certain events asynchronously:
customer logs in - we send them an email, if the IP address used is new to the system.
customer changes their name, we send them an email notifying of the change.
The idea is to make a separate application that reacts on "CustomerLoggedIn", "CustomerChangeName" events.
Here i can think of three approaches, how to implement this simple functionality, with each of them having some drawbacks. So, when a customer submits their name change:
Store change name Changed name is stored in the DB + an event is sent to Kafkas when the DB transaction is completed. One of the big problems that arise here is that if a customer had 2 tabs open and almost simultaneously submits a change from initial name "Bob" to "Alice" in one tab and from "Bob" to "Jim" in another one, on a database level one of the updates overwrites the other, which is ok, however we cannot guarantee the order of the events to be the same. We can use some checks to ensure that DB update is only done when "the last version" has been seen, thus preventing the second update at all, so only one event will be emitted. But in general case, this pattern will not allow us to preserve the same order of events in the DB as in Kafka, unless we do DB change + Kafka event sending in one distributed transaction, which is anti-pattern afaik.
Change the name in the DB, and use Debezium or similar DB CDC to capture the event and stream it. Here we get a single event source, so ordering problem is solved, however what bothers me is that i lose the ability to enrich the events with business information. Another related drawback is that CDC will stream all the updates in the "customer" table regardless of the business meaning of the event. So, in this case, i will probably need to build a Kafka Streams application to convert the DB CDC events to business events and decouple the DB structure from event structure. The potential benefit of this approach is that i will be able to capture "direct" DB changes in the same manner as those originated in the application.
Emit event from the application, without storing it in the DB. One of the subscribers might to the DB persistence, another will do email sending, etc. The biggest problem i see here is - what do i return to the client? I cannot say "Ok, your name is changed", it's more like "Ok, you request has been recorded and will be processed". In case if the customer quickly hits refresh - he expects to see his new name, as we don't want to explain to the customers what's eventual consistency, do we? Also the order of processing the same event by "email sender" and "db updater" is not guaranteed, so i can send an email before the change is persisted.
I am looking for advices regarding any of these three approaches (and maybe some others i am missing), maybe the usecases when one can be preferrable over others?
It sounds to me like you want event sourcing. In event sourcing, all you need to store is the event: the current state of a customer is derived from replaying the events (either from the beginning of time, or since a snapshot: the snapshot is just an optional optimization). Some other process (there are a few ways to go about this) can then project the events to Kafka for consumption by interested parties. Since every event has a sequence number, you can use the sequence number to prevent concurrent modification (alternatively, the more actor modely event-sourcing implementations can use techniques like cluster sharding in Akka to achieve the same ends).
Doing this, you can have a "write-side" which processes the updates in a strongly consistent manner and can respond to queries which only involve a single customer having seen every update to that point (the consistency boundary basically makes customer in this case an aggregate in domain-driven-design terms). "Read-sides" consuming events are eventually consistent: the latencies are typically fairly short: in this case your services sending emails are read-sides (as would be a hypothetical panel showing names of all customers), but the customer's view of their own data could be served by the write-side.
(The separation into read-sides and write-side (the pluralization is significant) is Command Query Responsibility Segregation, which sometimes gets interpreted as "reads can only be served by a read-side". This is not totally accurate: for one thing a write-side's model needs to be read in order for the write-side to perform its task of validating commands and synchronizing updates, so nearly any CQRS-using project violates that interpretation. CQRS should instead be interpreted as "serve reads from the model that makes the most sense and avoid overcomplicating a model (including that model in the write-side) to support a new read".)
I think I qualify to answer this, having extensively used debezium for simplifying the architecture.
I would prefer Option 2:
Every transaction always results in an event emitted in correct order
Option 1/3 has a corner case, what if transaction succeeds, but application fails to emit the event?
To your point:
Another related drawback is that CDC will stream all the updates in
the "customer" table regardless of the business meaning of the event.
So, in this case, i will probably need to build a Kafka Streams
application to convert the DB CDC events to business events and
decouple the DB structure from event structure.
I really dont think that is a roadblock. The benefit you get is potentially other usecases may crop up where another consumer to this topic may want to read other columns of the table.
Option 1 and 3 are only going to tie this to your core application logic, and that is not doing any favor from simplifying PoV. With option 2, with zero code changes to core application APIs, a developer can independently work on the events, with no need to understand that core logic.

DDD, Event Sourcing, and the shape of the Aggregate state

I'm having a hard time understanding the shape of the state that's derived applying that entity's events vs a projection of that entity's data.
Is an Aggregate's state ONLY used for determining whether or not a command can successfully be applied? Or should that state be usable in other ways?
An example - I have a Post entity for a standard blog post. I might have events like postCreated, postPublished, postUnpublished, etc. For my projections that I'll be persisting in my read tables, I need a projection for the base posts (which will include all posts, regardless of status, with lots of detail) as well as published_posts projection (which will only represent posts that are currently published with only the information necessary for rendering.
In the situation above, is my aggregate state ONLY supposed to be used to determine, for example, if a post can be published or unpublished, etc? If this is the case, is the shape of my state within the aggregate purely defined by what's required for these validations? For example, in my base post projection, I want to have a list of all users that have made a change to the post. In terms of validation for the aggregate/commands, I couldn't care less about the list of users that have made changes. Does that mean that this list should not be a part of my state within my aggregate?
TL;DR: yes - limit the "state" in the aggregate to that data that you choose to cache in support of data change.
In my aggregates, I distinguish two different ideas:
the history , aka the sequence of events that describes the changes in the lifetime of the aggregate
the cache, aka the data values we tuck away because querying the event history every time kind of sucks.
There's not a lot of value in caching results that we are never going to use.
One of the underlying lessons of CQRS is that we don't need aggregates everywhere
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003
If we aren't changing the data, then we can safely work directly with immutable copies of the data.
The only essential purpose of the aggregate is to determine what events, if any, need to be applied to bring the aggregate's state in line with a command (if the aggregate can be brought so in line). All state that's not needed for that purpose can be offloaded to a read-side, which can be thought of as a remix of the event stream (with each read-side only maintaining the state it needs).
That said, there are in practice, reasons to use the aggregate state directly, with the primary one being a desire for a stronger consistency for the aggregate: CQRS is inherently eventually consistent. As with all questions of consistent updates, it's important to recognize that consistency isn't free and very often isn't even cheap; I tend to think of a project as having a consistency budget and I'm pretty miserly about spending it.
In your case, there's probably no reason to include the list of users changing a post in the aggregate state, unless e.g. there's something like "no single user can modify a given post more than n times".

CQRS Event Sourcing valueobject & entities accepting commands

Experts,
I am in the evaluating moving a nice designed DDD app to an event sourced architecture as a pet project.
As can be inferred, my Aggregate operations are relatively coarse grained. As part of my design process, I now find myself emitting a large number of events from a small subset of operations to satisfy what I know to be requirements of the read model. Is this acceptable practice ?
In addition to this, I have distilled a lot of the domain complexity via use of ValueObjects & entities. Can VO's/ E accept commands and emit events themselves, or should I expose state and add from the command handler lower down the stack ?
In terms of VO's note that I use mutable operations sparingly and it is a trade off between over complicating other areas of my domain.
I now find myself emitting a large number of events from a small subset of operations to satisfy what I know to be requirements of the read model. Is this acceptable practice ?
In most cases, you should have 1 event by command. Remember events describe the user intent so you have to stay close to the user action.
Can VO's/ E accept commands and emit events themselves
No only aggregates emit events and yes you can come up with very messy aggregates if you have a lot of events, but there is solutions for that like delegating the work to the commands and events themselves. I blogged about this technique here: https://dev.to/maximegel/cqrs-scalable-aggregates-731
In terms of VO's note that I use mutable operations sparingly
As long as you're aware of the consequences. That's fine. Trade-off are part of the job, but be sure you team is aware of that since it's written everywhere that value objects are immutable you expose yourselves to confusion and pointer issues.

How to store sagas’ data?

From what I read aggregates must only contain properties which are used to protect their invariants.
I also read sagas can be aggregates which makes sense to me.
Now I modeled a registration process using a saga: on RegistrationStarted event it sends a ReserveEmail command which will trigger an EmailReserved or EmailReservationFailed given if the email is free or not. A listener will then either send a validation link or a message telling an account already exists.
I would like to use data from the RegistrationStarted event in this listener (say the IP and user-agent). How should I do it?
Storing these data in the saga? But they’re not used to protect invariants.
Pushing them through ReserveEmail command and the resulting event? Sounds tedious.
Project the saga to the read model? What about eventual consistency?
Another way?
Rinat Abdullin wrote a good overview of sagas / process managers.
The usual answer is that the saga has copies of the events that it cares about, and uses the information in those events to compute the command messages to send.
List[Command] processManager(List[Event] events)
Pushing them through ReserveEmail command and the resulting event?
Yes, that's the usual approach; we get a list [RegistrationStarted], and we use that to calculate the result [ReserveEmail]. Later on, we'll get [RegistrationStarted, EmailReserved], and we can use that to compute the next set of commands (if any).
Sounds tedious.
The data has to travel between the two capabilities somehow. So you are either copying the data from one message to another, or you are copying a correlation identifier from one message to another and then allowing the consumer to decide how to use the correlation identifier to fetch a copy of the data.
Storing these data in the saga? But they’re not used to protect invariants.
You are typically going to be storing events in the sagas (to keep track of what has happened). That gives you a copy of the data provided in the event. You don't have an invariant to protect because you are just caching a copy of a decision made somewhere else. You won't usually have the process manager running queries to collect additional data.
What about eventual consistency?
By their nature, sagas are always going to be "eventually consistent"; the "state" of an instance of a saga is just cached copies of data controlled elsewhere. The data is probably nanoseconds old by the time the saga sees it, there's no point in pretending that the data is "now".
If I understand correctly I could model my saga as a Registration aggregate storing all the events whose correlation identifier is its own identifier?
Udi Dahan, writing about CQRS:
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.

Difference between CQRS and CQS

I am learning what is CQRS pattern and came to know there is also CQS pattern.
When I tried to search I found lots of diagrams and info on CQRS but didn't found much about CQS.
Key point in CQRS pattern
In CQRS there is one model to write (command model) and one model to read (query model), which are completely separate.
How is CQS different from CQRS?
CQS (Command Query Separation) and CQRS (Command Query Responsibility Segregation) are very much related. You can think of CQS as being at the class or component level, while CQRS is more at the bounded context level.
I tend to think of CQS as being at the micro level, and CQRS at the macro level.
CQS prescribes separate methods for querying from or writing to a model: the query doesn't mutate state, while the command mutates state but does not have a return value. It was devised by Bertrand Meyer as part of his pioneering work on the Eiffel programming language.
CQRS prescribes a similar approach, except it's more of a path through your system. A query request takes a separate path from a command. The query returns data without altering the underlying system; the command alters the system but does not return data.
Greg Young put together a pretty thorough write-up of what CQRS is some years back, and goes into how CQRS is an evolution of CQS. This document introduced me to CQRS some years ago, and I find it a still very useful reference document.
This is an old question but I am going to take a shot at answering it. I do not often answer questions on StackOverflow, so please forgive me if I do something outside the bounds of community in terms of linking to things, writing a long answer, etc.
There are many differences between CQRS and CQS however CQRS uses CQS inside of its definition! Let's start with defining the two and then we can discuss differences.
CQS defines two types of messages depending on their return value: no return value (void) specifies this is a Command; a return value (non-void) specifies this method is a Query.
Commands change information
Queries return information
Commands change state. Queries do not.
Now for CQRS, which uses the same definition as CQS for Commands and Queries. What CQRS says is that we do not want one object with Command and Query methods. Instead we want two objects: one with all the Commands and one with all the Queries.
The idea overall is very simple; it's everything after doing this where things become interesting. There are numerous talks online, of me discussing some of the attributes associated (sorry way too much to type here!).
CQS is about Command and Queries. It doesn't care about the model. You have somehow separated services for reading data, and others for writing data.
CQRS is about separate models for writes and reads. Of course, usage of write model often requires reading something to fulfill business logic, but you can only do reads on read model. Separate Databases are state of the art. But imagine single DB with separate models for read and writes modeled in ORM. It's very often good enough.
I have found that people often say they practice CQRS when they have CQS.
Read the inventor Greg Young's answer
I think, like "Dependency Injection" the concepts are so simple and taken for granted that the fact that they have fancy names seems to drive people to think they're something more than they are, especially as CQRS is often quoted alongside Event Sourcing.
CQS is the separation of methods that read to those that change state; don't do both in a single method. This is micro level.
CQRS extends this concept into a higher level for machine-machine APIs, separation of the message models and processing paths.
So CQRS is a principle you apply to the code in an API or facade.
I have found CQRS to essentially be a very strong S in SOLID, pushing the separation deeply into the psyche of developers to produce more maintainable code.
I think web applications are a bad fit for CQRS since the mutation of state via representation transfer means the command and query are two sides of the same request-response. The representation is a command and the response is the query.
For example, you send an order and receive a view of all your orders.
Imagine if the code of a website was factored into a command side and query side. The route action handling code would need to fall into one of those sides, but it does both.
Imagining a stronger segregation, if the code was moved into two different compilable codebases, then the website would accept a POST of a form, but the user would have to browse to another website URL to see the impact of the action. This is obviously crazy. One workaround would be to always redirect, though this wouldn't really be RESTful since the ideal REST application is where the next representation contains hypertext to drive the next state transition and so on.
Given that a website is a REST API between human and machine (or machine and machine), this also includes REST APIs, though other types of HTTP message passing API may be a perfect fit for CQRS.
A service or facade within the bounds of the website may obviously work well with CQRS, though the action handlers would sit outside this boundary.
See CQS on Wikipedia
The biggest difference is CQRS uses separate data stores for commands and queries. A query store can use a different technology like a document database or just be a denormalized schema in the same database that makes querying the data easier.
The data between databases is usually copied asynchronously using something like a service bus. Therefore, data in the query store is eventually consistent (is going to be there at some point of time). Applications need to account for that. While it is possible to use the same transaction (same database or a 2-phase commit) to write in both stores, it is usually not recommended for scalability reasons.
CQS architecture reads and writes from the same data store/tables.