Query vs Transaction - cqrs

In this picture, we can see saga is the one that implements transactions and cqrs implements queries. As far as I know, a transaction is a set of queries that follow ACID properties.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?

Note that this diagram is not meant to explain what Sagas and CQRS are. In fact, looking at it this way it is quite confusing. What this diagram is telling you is what patterns you can use to read and write data that spans multime microservices. It is saying that in order to write data (somehow transactionally) across multiple microservices you can use Sagas and in order to read data which belongs to multiple microservices you can use CQRS. But that doesn't mean that Sagas and CQRS have anything in common. They are two different patterns to solve completely different problems (reads and writes). To make an analogy, it's like saying that to make pizzas (Write) you can use an oven and to view the pizzas menu (Read) you can use a tablet.
On the specific patterns:
Sagas: you can see them as a process manager or state machine. Note that they do not implement transactions in the RDBMS sense. Basically, they allow you to create a process that will take care of telling each microservice to do a write operation and if one of the operations fails, it'll take care of telling the other microservices to rollback (or compensate) the action that they did. So, these "transactions" won't be atomic, because while the process is running some microservices will have already modified the data and others won't. And it is not garanteed that whatever has succeed can sucessfully be rolled back or compensated.
CQRS (Command Query Responsibility Segregation): suggests the separation of Commands (writes) and Queries (Reads). The reason for that, it is what I was saying before, that the reads and writes are two very different operations. Therefore, by separating them, you can implement them with the patterns that better fit each scenario. The reason why CQRS is shown in your diagram as a solution for reading data that comes from multiple microservices is because one way of implementing queries is to listen to Domain Events coming from multiple microservices and storing the information in a single database so that when it's time to query the data, you can find it all in a single place. An alternative to this would be Data Composition. Which would mean that when the query arrives, you would submit queries to multiple microservices at that moment and compose the response with the composition of the responses.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?
Personally I would not mix the concepts of CQRS and Sagas. I think this can really confuse you. Consider both patterns as two completely different things and try to understand them both independently.

Related

Event Sourcing - How to query inside a command?

We would like to be able to read state inside a command use case.
We could get the state from event store for the specific aggregate, but what about querying aggregates by field(not id) or performing more complicated queries, that are not fitted for the event store?
The approach we were thinking was to use our read model for those cases as well and not only for query use cases.
This might be inconsistent, so a solution could be to have the latest version of the aggregate stored in both write/read models, in order to be able to tell if the state is correct or stale.
Does this make sense and if yes, if we need to get state by Id should we use event store or the read model?
If you want the absolute latest state of an event-sourced aggregate, you're going to have to read the latest snapshot (assuming that you are snapshotting) and then replay events since that snapshot from the event store. You can be aggressive about snapshotting (conceivably even saving a snapshot after every command), but you're giving away some write performance to make the read faster.
Updating the read model directly is conceivably possible, though that level of coupling is something that should be considered very carefully. Note also that you will very likely need some sort of two-phase commit to ensure that the read model is only updated when the write model is updated and vice versa. I strongly suggest considering why you're using CQRS/ES in this project, because you are quite possibly undermining that reason by doing this sort of thing.
In general, if you need a query for processing a particular command, it's likely that query will generally be the same, i.e. you don't need free-form query support. In that case, you can often have a read model that's tuned for exactly that query and which only cares about events which could affect that query: often a fairly small subset of the events. The finer-grained the read model, the easier it is to keep in sync (if it ignores 99% of events, for instance, it can't really fall that far behind).
Needing to make complex queries as part of command processing could also be a sign that your aggregate boundaries aren't right and could do with a re-examination.
Does this make sense
Maybe. Let's start with
This might be inconsistent
Yup, they might be. So what?
We typically respond to a query by sending an unlocked copy of the answer. In other words, it's possible that the actual information in the write model will change after this response is dispatched but before the response arrives at its destination. The client will be looking at a copy of the answer taken from the past.
So we might reasonably ask how much better it is to get information no more than one minute old compared to information no more than five minutes old. If the difference in value is pennies, then you should probably deploy the five minute version. If the difference is millions of dollars, then you're in a good position to negotiate a real budget to solve the problem.
For processing a command in our own write model, that kind of inconsistency isn't usually acceptable or wise. But neither of the two common answers require keeping the read and write models synchronized. The most common answer is to just work with the write model alone. The less common answer is to grab a snapshot out of a cache, and then apply any additional events to it to bring it up to date. The latter approach is "just" a performance optimization (first rule: don't.)
The variation that trips everyone up is trying to process a command somewhere else, enforcing a consistency rule on our data here. Once again, you need a really clear picture of how valuable the consistency is to the business. If it's really important, that may be a signal that the information in question shouldn't be split into two different piles - you may be working with the wrong underlying data model.
Possibly useful references
Pat Helland Data on the Outside Versus Data on the Inside
Udi Dahan Race Conditions Don't Exist

Understanding CQRS and EventSourcing

I read several blogs and watched video about usefulness of CQRS and ES. I am left with implementation confusion.
CQRS: when use separate table, one for "Write, Update and delete" and other for Read operation. So then how the data sync from write table to read table. Do we required to use cron job to sync data to read only table from write table or any other available options ?
Event Sourcing: Do we store only all Immutable sequential operation as record for each update happened upon once created in one storage. Or do we also store mutable record I mean the same record is updated in another storage
And Please explain RDBMS, NoSQL and Messaging to be used and where they fit into it
when use separate table, one for "Write, Update and delete" and other for Read operation. So then how the data sync from write table to read table.
You design an asynchronous process that understands how to transform the data from its "write" representation to its "read" representation, and you design a scheduler to decide when that asynchronous process runs.
Part of the point is that it's just plumbing, and you can choose whatever plumbing you want that satisfies your operational needs.
Event Sourcing
On the happy path, each "event stream" is a append only sequence of immutable events. In the case where you are enforcing a domain invariant over the contents of the stream, you'll normally have a "first writer wins" conflict policy.
But "the" stream is the authoritative copy of the events. There may also be non-authoritative copies (for instance, events published to a message bus). They are typically all immutable.
In some domains, where you have to worry about privacy and "the right to be forgotten", you may need affordances that allow you to remove information from a previously stored event. Depending on your design choices, you may need mutable events there.
RDBMS
For many sorts of queries, especially those which span multiple event streams, being able to describe the desired results in terms of relations makes the programming task much easier. So a common design is to have asynchronous process that read information from the event streams and update the RDBMS. The usual derived benefit is that you get low latency queries (but the data returned by those queries may be stale).
RDBMS can also be used as the core of the design of the event store / message store itself. Events are common written as blob data, with interesting metadata exposed as additional columns. The message store used by eventide-project is based on postgresql.
NoSQL
Again, can potentially be used as your cache of readable views, or as your message store, depending on your needs. Event Store would be an example of a NoSQL message store.
Messaging
Messaging is a pattern for temporal decoupling; the ability to store/retrieve messages in a stable central area affords the ability to shut down a message producer without blocking the message consumer, and vice versa. Message stores also afford some abstraction - the producer of a message doesn't necessarily know all of the consumers, and the consumer doesn't necessarily know all of the producers.
My Question is about Event Sourcing. Do we required only immutable sequence events to be stored and where to be stored ?
In event sourcing, the authoritative representation of the state is the sequence of events - your durable copy of that event sequence is the book of truth.
As for where they go? Well, that is going to depend on your architecture and storage choices. You could manage files on disk yourself, you could write them in to your own RDBMS; you could use an RDBMS designed by somebody else, you could use a NoSQL document store, you could use a dedicated message store.
There could be multiple stores -- for instance, in a micro service architecture, the service that accepts orders might be different from the service that tracks order fulfillment, and they could each be writing events into different storage appliances.

CQRS Write database

In our company we are developing a microservice based system and we apply CQRS pattern. In CQRS we separate Commands and Queries, because of that we have to develop 2 microservices. Currently I was assigned to enhance CQRS pattern to save events in a separate database (event sourcing). I understand that having a separate event database is very important but do we really need a separate Write Database? What is the actual use of the Write database?
If you have an event database, it is your Write database. It is the system-of-record and contains the transactionally-consistent state of the application.
If you have a separate Read database, it can be built off of the event log in either a strongly-consistent or eventually-consistent manner.
I understand that having a separate event database is very important but do we really need a separate Write Database? What is the actual use of the Write database?
The purpose of the write database is to stand as your book of record. The write database is the persisted representation that you use to recover on restart. It's the synchronization point for all writes.
It's "current Truth" as your system understands it.
In a sense, it is the "real" data, where the read models are just older/cached representations of what the real data used to look like.
It may help to think in terms of an RDBMS. When traffic is small, we can serve all of the incoming requests from a single database. As traffic increases, we want to start offloading some of that traffic. Since we want the persisted data to be in a consistent state, we can't offload the writes -- not if we want to be resolving conflicts at the point of the write. But we can shed reads onto other instances, provided that we are wiling to admit some finite interval of time between when a write happens, and when the written data is available on all systems.
So we send all writes to the leader, who is responsible for organizing everything into the write ahead log; changes to the log can then be replicated to the other instances, which in turn build out local copies of the data structures used to support low latency queries.
If you look very carefully, you might notice that your "event database" shares a lot in common with the "write ahead log".
No, you don't necessarily need a separate write database. The core of CQRS segregation is at the model (code) level. Going all the way to the DB might be beneficial or detrimental to your project, depending on the context.
As with many orthogonal architectural decisions surrounding the use of CQRS (Event Sourcing, Command Bus, etc.), the pros and cons should be considered carefully prior to adoption. Below some amount of concurrent access, separating read and write DBs might not be worth the effort.

Data Synchronization in a Distributed system

We have an REST-based application built on the Restlet framework which supports CRUD operations. It uses a local-file to store the data.
Now the requirement is to deploy this application on multiple VMs and any update operation in one VM needs to be propagated other application instances running on other VMs.
Our idea to solve this was to send multiple POST msgs (to all other applications) when a update operation happens in a given VM.
The assumption here is that each application has a list/URLs of all other applications.
Is there a better way to solve this?
Consistency is a deep topic, and a hard thing to get right. The trouble comes when two nearly-simultaneous changes occur to the same data: conflicting updates can arrive in one order on one server, and in another order on another. This is a problem, since the two servers no longer agree on what the data is, and it isn't clear who is "right".
The short-story: get your favorite RDBMS (for example, mysql is popular) and have your app servers connect to in what is called the three-tier model. Be sure to perform complex updates in transactions, which will provide an acceptable consistency model.
The long-story: The three-tier model serves well for small-to-medium scale web sites/services. You will eventually find that the single database becomes the bottleneck. For services whose read traffic is substantially larger than write traffic, a common optimization is to create a single-master, many-slave database replication arrangement, where all writes go to the single master (required for consistency with non-distributed transactions), but the more-common reads could go to any of the read slaves.
For services with evenly-mixed read/write traffic, you may be better served by dropped some of the conveniences (and accompanying restrictions) that formal SQL provides and instead use of one of the various "nosql" data stores that have recently emerged. Their relative merits and fitness for various problems is a deep topic in itself.
I can see 7 major options for now. You should find out more details and decide whether the facilities / trade-offs are appropriate for your purpose
Perform the CRUD operation on a common RDBMS. Simplest and most consistent
Perform the CRUD operations on a common RDBMS which runs as fast in-memory RDBMS. eg TimesTen from Oracle etc
Perform the CRUD on a distributed cache or your own home cooked distributed hash table which can guarantee synchronization eg Hazelcast/ehcache and others
Use a fast common state server like REDIS/memcached and perform your updates
in a synchronized manner on it and write out the successfull operations to a DB in a lazy manner if required.
Distribute your REST servers such that the CRUD operations on a single entity are only performed by a single master. Once this is done, the details about the changes can be communicated to everyone else using a reliable message bus or a distributed database (eg postgres) that runs underneath and syncs all of your updates fairly fast.
Target eventual consistency and use a distributed data store like Cassandra which lets you target the consistency you require
Use distributed consensus algorithms like Paxos or RAFT or an implementation of the same(recommended) like zookeeper or etcd respectively and take ownership of the item you want to change from each REST server before you perform the CRUD operation - might be a bit slow though and same stuff is what Cassandra might give you.

Recovery or failover strategies when NoSQL data becomes inconsistent

NoSQL emphasizes availability over consistency. Sometimes, this would cause the data in your NoSQL datastore to become inconsistent.
1) What are strategies to recover from such a situation?
2) What are strategies to prevent such a situation if possible?
3) What are the specific strategies for the popular NoSQL vendors, such as MongoDB, CouchDB, Cassandra, and HBase?
I think with asking point #3 you are mixing 2 different problems:
A. Database becomes unreadable i.e. its data files are corruputed and data is not accessible or partially accessible
B. Application data stored in NoSQL database becomes inconsistent (e.g. some key mistmatch happened) for application to use that and application starts to behave weirdly.
Problem A is a database maintainability issue and each database handles it in a specific way (e.g., MongoDB). And truly speaking it's not only NoSQL problem. But in general this kind of situation is rather an emergency and shouldn't happen if your database engine is solid and has good and enough hardware.
Problem B is poorly your application specific and I think the main strategy here is to make your application expect that data might be inconsistent at some point and try to work around that if possible. There can also be some background process that finds inconsistencies in data. In any case it purely depends on your data model.
EDIT: Updates on the data in NoSQL database are not transactional, but in general are atomic. So, if one tuple is updated by 2 different processes you will not get part of the tuple from one and another part from another, you will get the whole tuple from one of the processes which is considered "last" by the engine. But if your application updates several "dependent" tuples, then result for several updating threads is not predicatable, of course, because there is no transaction around those multiple updates. Unless, of course, all pocesses put same data into database. But if you have too many dependecies between different types of tuples/objects then I would say that your application is using NoSQL in a wrong way.
EDIT: There is also intresting discussion here.