CQRS Write database - cqrs

In our company we are developing a microservice based system and we apply CQRS pattern. In CQRS we separate Commands and Queries, because of that we have to develop 2 microservices. Currently I was assigned to enhance CQRS pattern to save events in a separate database (event sourcing). I understand that having a separate event database is very important but do we really need a separate Write Database? What is the actual use of the Write database?

If you have an event database, it is your Write database. It is the system-of-record and contains the transactionally-consistent state of the application.
If you have a separate Read database, it can be built off of the event log in either a strongly-consistent or eventually-consistent manner.

I understand that having a separate event database is very important but do we really need a separate Write Database? What is the actual use of the Write database?
The purpose of the write database is to stand as your book of record. The write database is the persisted representation that you use to recover on restart. It's the synchronization point for all writes.
It's "current Truth" as your system understands it.
In a sense, it is the "real" data, where the read models are just older/cached representations of what the real data used to look like.
It may help to think in terms of an RDBMS. When traffic is small, we can serve all of the incoming requests from a single database. As traffic increases, we want to start offloading some of that traffic. Since we want the persisted data to be in a consistent state, we can't offload the writes -- not if we want to be resolving conflicts at the point of the write. But we can shed reads onto other instances, provided that we are wiling to admit some finite interval of time between when a write happens, and when the written data is available on all systems.
So we send all writes to the leader, who is responsible for organizing everything into the write ahead log; changes to the log can then be replicated to the other instances, which in turn build out local copies of the data structures used to support low latency queries.
If you look very carefully, you might notice that your "event database" shares a lot in common with the "write ahead log".

No, you don't necessarily need a separate write database. The core of CQRS segregation is at the model (code) level. Going all the way to the DB might be beneficial or detrimental to your project, depending on the context.
As with many orthogonal architectural decisions surrounding the use of CQRS (Event Sourcing, Command Bus, etc.), the pros and cons should be considered carefully prior to adoption. Below some amount of concurrent access, separating read and write DBs might not be worth the effort.

Related

Query vs Transaction

In this picture, we can see saga is the one that implements transactions and cqrs implements queries. As far as I know, a transaction is a set of queries that follow ACID properties.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?
Note that this diagram is not meant to explain what Sagas and CQRS are. In fact, looking at it this way it is quite confusing. What this diagram is telling you is what patterns you can use to read and write data that spans multime microservices. It is saying that in order to write data (somehow transactionally) across multiple microservices you can use Sagas and in order to read data which belongs to multiple microservices you can use CQRS. But that doesn't mean that Sagas and CQRS have anything in common. They are two different patterns to solve completely different problems (reads and writes). To make an analogy, it's like saying that to make pizzas (Write) you can use an oven and to view the pizzas menu (Read) you can use a tablet.
On the specific patterns:
Sagas: you can see them as a process manager or state machine. Note that they do not implement transactions in the RDBMS sense. Basically, they allow you to create a process that will take care of telling each microservice to do a write operation and if one of the operations fails, it'll take care of telling the other microservices to rollback (or compensate) the action that they did. So, these "transactions" won't be atomic, because while the process is running some microservices will have already modified the data and others won't. And it is not garanteed that whatever has succeed can sucessfully be rolled back or compensated.
CQRS (Command Query Responsibility Segregation): suggests the separation of Commands (writes) and Queries (Reads). The reason for that, it is what I was saying before, that the reads and writes are two very different operations. Therefore, by separating them, you can implement them with the patterns that better fit each scenario. The reason why CQRS is shown in your diagram as a solution for reading data that comes from multiple microservices is because one way of implementing queries is to listen to Domain Events coming from multiple microservices and storing the information in a single database so that when it's time to query the data, you can find it all in a single place. An alternative to this would be Data Composition. Which would mean that when the query arrives, you would submit queries to multiple microservices at that moment and compose the response with the composition of the responses.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?
Personally I would not mix the concepts of CQRS and Sagas. I think this can really confuse you. Consider both patterns as two completely different things and try to understand them both independently.

Understanding CQRS and EventSourcing

I read several blogs and watched video about usefulness of CQRS and ES. I am left with implementation confusion.
CQRS: when use separate table, one for "Write, Update and delete" and other for Read operation. So then how the data sync from write table to read table. Do we required to use cron job to sync data to read only table from write table or any other available options ?
Event Sourcing: Do we store only all Immutable sequential operation as record for each update happened upon once created in one storage. Or do we also store mutable record I mean the same record is updated in another storage
And Please explain RDBMS, NoSQL and Messaging to be used and where they fit into it
when use separate table, one for "Write, Update and delete" and other for Read operation. So then how the data sync from write table to read table.
You design an asynchronous process that understands how to transform the data from its "write" representation to its "read" representation, and you design a scheduler to decide when that asynchronous process runs.
Part of the point is that it's just plumbing, and you can choose whatever plumbing you want that satisfies your operational needs.
Event Sourcing
On the happy path, each "event stream" is a append only sequence of immutable events. In the case where you are enforcing a domain invariant over the contents of the stream, you'll normally have a "first writer wins" conflict policy.
But "the" stream is the authoritative copy of the events. There may also be non-authoritative copies (for instance, events published to a message bus). They are typically all immutable.
In some domains, where you have to worry about privacy and "the right to be forgotten", you may need affordances that allow you to remove information from a previously stored event. Depending on your design choices, you may need mutable events there.
RDBMS
For many sorts of queries, especially those which span multiple event streams, being able to describe the desired results in terms of relations makes the programming task much easier. So a common design is to have asynchronous process that read information from the event streams and update the RDBMS. The usual derived benefit is that you get low latency queries (but the data returned by those queries may be stale).
RDBMS can also be used as the core of the design of the event store / message store itself. Events are common written as blob data, with interesting metadata exposed as additional columns. The message store used by eventide-project is based on postgresql.
NoSQL
Again, can potentially be used as your cache of readable views, or as your message store, depending on your needs. Event Store would be an example of a NoSQL message store.
Messaging
Messaging is a pattern for temporal decoupling; the ability to store/retrieve messages in a stable central area affords the ability to shut down a message producer without blocking the message consumer, and vice versa. Message stores also afford some abstraction - the producer of a message doesn't necessarily know all of the consumers, and the consumer doesn't necessarily know all of the producers.
My Question is about Event Sourcing. Do we required only immutable sequence events to be stored and where to be stored ?
In event sourcing, the authoritative representation of the state is the sequence of events - your durable copy of that event sequence is the book of truth.
As for where they go? Well, that is going to depend on your architecture and storage choices. You could manage files on disk yourself, you could write them in to your own RDBMS; you could use an RDBMS designed by somebody else, you could use a NoSQL document store, you could use a dedicated message store.
There could be multiple stores -- for instance, in a micro service architecture, the service that accepts orders might be different from the service that tracks order fulfillment, and they could each be writing events into different storage appliances.

How to write data to both NoSQL and RDBMS simultaneously and efficiently

Let’s assume a setup where a mobile application is communicating with its backend via an API, and data resulting from this communication (eg JSON- based transaction writes among others) is written into and read from a MongoDB instance.
Now since I would like to perform some heavy analytics on data stored in mongo, should I rather:
save data directly to RDBMS at the same time as I write to Mongo (so the backend service calls Mongo and after successful write also calls RDBMS)
perform read from Mongo (with some intervals) and load fresh data into RDBMS
I am afraid that both of those solutions require also re-engineering theoretically schema-less Mongo to be in constant agreement with relations and schema in RDBMS. Does it really require more planning for any document structure changes in Mongo? I intuitively say yes, but I look for real world examples. I hope my point is clear enough.
Maybe CQRS pattern will be good for You.
See: https://martinfowler.com/bliki/CQRS.html
You can use RDBMS for Write Model. Mongo - for Read Model.
After every write operation to RDBMS You should update Your ReadModel (MongoDB Document) based on data from Write Model.
There are a few constraints that need to be understood before you embark on a solution here. The most relevant of these is latency. How out-of-date can your data be?
You are almost definitely looking at some kind of write-behind solution here, taking data out of MongoDB, and writing it to your data warehouse. The question is, how far behind your MongoDB can your data warehouse be? Many solutions based on an extract-transform-load model (ETL) work on a nightly basis, so as to minimize impact on the online system. Some can do the same on an hourly basis, but will have more potential impact on the live system.
Transaction-by-transaction support is likely not needed for an analysis system. You really want to avoid this if you can, as it puts far more load on both systems than is usually justified.
To answer your second question, yes, once you start depending on a schema, it needs to be stable. It doesn't have to be synced up with your target schema necessarily, but your ETL process will have to be aware of both, and will have to be modified any time either one materially changes. Being "schema-less" doesn't mean there isn't a schema, it just means that the schema is not enforced by the software, instead it is enforced by the dependencies on the system.
I think the option with least engineering effort is to use a Kafka connector for MongoDB, such that the connector will read the MongoDB changes from the oplog in near-real time and write the event in Kafka. Then from Kafka you can write the data to a relational DB using a stream processing.
Dual write from UI is not a good option as it can introduce latency, complexity and opeeational overhead. What if the write to one DB fails?

Data Synchronization in a Distributed system

We have an REST-based application built on the Restlet framework which supports CRUD operations. It uses a local-file to store the data.
Now the requirement is to deploy this application on multiple VMs and any update operation in one VM needs to be propagated other application instances running on other VMs.
Our idea to solve this was to send multiple POST msgs (to all other applications) when a update operation happens in a given VM.
The assumption here is that each application has a list/URLs of all other applications.
Is there a better way to solve this?
Consistency is a deep topic, and a hard thing to get right. The trouble comes when two nearly-simultaneous changes occur to the same data: conflicting updates can arrive in one order on one server, and in another order on another. This is a problem, since the two servers no longer agree on what the data is, and it isn't clear who is "right".
The short-story: get your favorite RDBMS (for example, mysql is popular) and have your app servers connect to in what is called the three-tier model. Be sure to perform complex updates in transactions, which will provide an acceptable consistency model.
The long-story: The three-tier model serves well for small-to-medium scale web sites/services. You will eventually find that the single database becomes the bottleneck. For services whose read traffic is substantially larger than write traffic, a common optimization is to create a single-master, many-slave database replication arrangement, where all writes go to the single master (required for consistency with non-distributed transactions), but the more-common reads could go to any of the read slaves.
For services with evenly-mixed read/write traffic, you may be better served by dropped some of the conveniences (and accompanying restrictions) that formal SQL provides and instead use of one of the various "nosql" data stores that have recently emerged. Their relative merits and fitness for various problems is a deep topic in itself.
I can see 7 major options for now. You should find out more details and decide whether the facilities / trade-offs are appropriate for your purpose
Perform the CRUD operation on a common RDBMS. Simplest and most consistent
Perform the CRUD operations on a common RDBMS which runs as fast in-memory RDBMS. eg TimesTen from Oracle etc
Perform the CRUD on a distributed cache or your own home cooked distributed hash table which can guarantee synchronization eg Hazelcast/ehcache and others
Use a fast common state server like REDIS/memcached and perform your updates
in a synchronized manner on it and write out the successfull operations to a DB in a lazy manner if required.
Distribute your REST servers such that the CRUD operations on a single entity are only performed by a single master. Once this is done, the details about the changes can be communicated to everyone else using a reliable message bus or a distributed database (eg postgres) that runs underneath and syncs all of your updates fairly fast.
Target eventual consistency and use a distributed data store like Cassandra which lets you target the consistency you require
Use distributed consensus algorithms like Paxos or RAFT or an implementation of the same(recommended) like zookeeper or etcd respectively and take ownership of the item you want to change from each REST server before you perform the CRUD operation - might be a bit slow though and same stuff is what Cassandra might give you.

Recovery or failover strategies when NoSQL data becomes inconsistent

NoSQL emphasizes availability over consistency. Sometimes, this would cause the data in your NoSQL datastore to become inconsistent.
1) What are strategies to recover from such a situation?
2) What are strategies to prevent such a situation if possible?
3) What are the specific strategies for the popular NoSQL vendors, such as MongoDB, CouchDB, Cassandra, and HBase?
I think with asking point #3 you are mixing 2 different problems:
A. Database becomes unreadable i.e. its data files are corruputed and data is not accessible or partially accessible
B. Application data stored in NoSQL database becomes inconsistent (e.g. some key mistmatch happened) for application to use that and application starts to behave weirdly.
Problem A is a database maintainability issue and each database handles it in a specific way (e.g., MongoDB). And truly speaking it's not only NoSQL problem. But in general this kind of situation is rather an emergency and shouldn't happen if your database engine is solid and has good and enough hardware.
Problem B is poorly your application specific and I think the main strategy here is to make your application expect that data might be inconsistent at some point and try to work around that if possible. There can also be some background process that finds inconsistencies in data. In any case it purely depends on your data model.
EDIT: Updates on the data in NoSQL database are not transactional, but in general are atomic. So, if one tuple is updated by 2 different processes you will not get part of the tuple from one and another part from another, you will get the whole tuple from one of the processes which is considered "last" by the engine. But if your application updates several "dependent" tuples, then result for several updating threads is not predicatable, of course, because there is no transaction around those multiple updates. Unless, of course, all pocesses put same data into database. But if you have too many dependecies between different types of tuples/objects then I would say that your application is using NoSQL in a wrong way.
EDIT: There is also intresting discussion here.