CQRS: Synchronizing the Write and Read databases

CQRS: Synchronizing the Write and Read databases - cqrs

Can anyone please give me some direction in regards to various ways to
synchronize the Write and Read databases?
What are different technologies out there, and how do you evaluate each, in
terms of realiability, performance, cost to implement, etc.

Typically in CQRS, the write DB is used to store transitional data for long running processes (sagas). If you are synchronizing the read and write DB (I'm assuming you mean both ways), you might be doing something wrong.
For a long running process where a service expects multiple messages, it needs a way to temporary store data before the all the messages arrives. An example of this is customer registration where an approval from manager, which takes a week to process, is required. The service needs a way to temporarily store the customer information before the approval arrives. This is where the write DB is used to store this piece of temporary data. Note that before the customer is approved, nothing is written to the read DB yet.
When the approval finally arrives, the service will take the customer information from the write DB, complete the registration process and write it to the read DB. At this time, the temporary customer information in the write DB has done its job and can be removed from the write DB. Notice that there isn't any two-way sync'ing involved.
For simpler process such as change customer first name, the change can be written to the read DB right away. Writing to the write DB is not required because there is no temporary data in this case.

Query model need not be consistent.. it needs to be eventually consistent. Query model is also the view model, i.e. tables are already joined as per requirement of user interface. So you can use even an in memory cache, or like Redis.
Command side is like command objects which contain all relevant information to update database. These objects may fill up a messaging queue. The command objects are processed by a command processor which transactionally updates the query cache and the write database. The write database can be an RDBMS.. but as is apparent, should be write optimized like MongoDB.
You can update read database via a messaging system too.
Some good messaging systems for this purpose are RabbitMQ and 0MQ.

If you, like me see the read store as the db that the Query service use (and its denormalized)
and the write db as the database where the Domain events are stored , then if you need to Synch them to a particular moment then what you can do is just replay the events that you have stored.
In the case you want to be as up to date as possible then you need not to restrict by version
If you are using CQRS, then probably you will have a repository that looks somewhat like this
public interface IRepository<T> where T : AggregateRoot, new()
{
void Save(AggregateRoot aggregate, int expectedVersion);
T GetById(Guid id);
T GetById(Guid id, int version);
}
Hope this helps
Cheers

Related

How to query axon aggregates

Is there a way to see the current state of the aggregates stored in axon?
Our application uses a Oracle backed axon event store.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.

Is there a way to see the current state of the aggregates stored in axon?
In short, yes, although it is not recommended. Granted, if you are planning to employ CQRS. CQRS, or Command-Query Responsibility Separation, dictates that the Command Model and the Query Model are separate.
The aggregate support Axon delivers supplies an easy means to construct a Command Model. As the name suggests, it's intended for commands. On the flip side, you have Query Models, which are designed for queries. AxonIQ has this to say on CQRS; maybe that clarifies some things.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.
That's interesting on its own account! When you publish events in Axon, either through the AggregateLifecycle#apply(Object...) or EventGateway#publish(Object...) method, the published event should end up in your domain_event_entry table. If that's not the case, then either your JPA/JDBC configuration has a misser or some other exceptions occurring in your application.
Would you be able to update your issue with samples of your configuration and/or stack traces that you are seeing?
Replaying production issues locally
What I've done in the past to be able to replay behavior occurring in a production environment is by loading the Aggregate's event stream from that environment into a local dev/test event store. To be able to query this, you only need the aggregate identifier. As the aggregate identifier is indexed, retrieving all events for a specific aggregate (differently named, the aggregate stream) is straightforward.
By doing so, I could run the application locally to flow through the aggregate step-by-step. This gave the benefit of knowing exactly which event caused what state change, leading to the problematic scenario.
However, why your events are not present in your domainevententry is unclear to me. If you're still facing issues with that, I still recommend that you update the question with more specifics on your project.

Relational DB in microservices

I have a monolithic application that currently uses a PostgreSQL DB and the schemas are set up as you would expect for most relational databases with various table data being linked back to the user via FKs on the user_id.
I'm trying to learn more about microservices am trying to migrate my python API to a microservice architecture. I have a reasonable understanding of how I'm going to break up the larger app into smaller parts, however, I'm not entirely clear on how I'm supposed to deal with the data side of things.
I understand that one single large DB is against general design principles of microservices but I'm not clear on what the alternative would be.
My biggest concern is cascading across individual databases that would hold microservice data. In a simple rdb, I can just cascade on delete and the DB will handle the work across the various tables. In the case of microservices, how would that work? Would I need to have a separate service that handles deleting user data across the other service DBs?
I don't really understand how I would migrate a traditional application with a relational DB to a microservice architecture?
EDIT:
To clarify - a specific architectural/design problem I'm facing is as follows:
I have split up my application into a few microservices. The ones that are in my mind still relational are:
Geolocation - A service that checks geometry data, records in PostGIS, and returns certain information. A primary purpose is to record the location of a particular user for referencing later
Image - A simple upload service to upload images and store meta data in the db.
Load-Image - A simple service that returns a random set of images based on parameters such as location, and user profile data such as Age, Gender, etc
Profile - A service that simply manages user data such as Age, Gender, etc
Normally, these three items would have a table each in a larger db rather than their own individual dbs. Filtering images by say location and age is a very simple JOIN and filter.
How would something like that work in a microservice architecture? If the data is held in different dbs entirely how would I setup the logic to filter the data? I could duplicate data that doesn't change often like profile info and add it to a MongoDB document that would contain image data including user_id and profile data - however, location data can change regularly and constant updates doesn't sound practical.
What would be the best approach? Or should I stick with a shared RDBMS for just those few services?

It comes down to the duplication of data, why we want it, and how we manage it.
Early in our careers we were taught about the duplication of data to make it redundant, for example in database replication or backups. We were also taught that data can be modelled in a relational manner, with constraints enforcing the integrity of the model. In fact, the integrity of the model is sacrosanct. Without integrity, how can you have consistency? The answer is that you can't. Kinda.
When you work with distributed systems and service orientation, you do so because you want to minimise interactions thereby reducing coupling between components. However, there is a cost to this. The more distributed your architecture, the less coupling it has, and the more duplication of data will be necessary. This is taken to an extreme with microservices, where effectively the same data may be present in many different places, in varying degrees of consistency.
Instead of being bad, however, in this context data duplication is an essential feature of your system. It is an enabler of an architectural style with many great benefits. Put another way, without duplication of data, you get less distribution, you get more coupling, which makes your system more expensive to build, own, and change.
So, now we understand duplication of data and why we want it, let's move onto how we manage having lots of duplication. Let's try an example:
In a relational database, let's say we have a table called Customers, which contains a customer ID, and customer details, and another table called Orders which contains the order ID, customer ID, and the order details. Let's say we also have an ordering application, which needs to delete all the customer's orders if the customer is deleted for GDPR.
Because we are migrating our system to microservices, we decide to create a service called Customers.
So we create a service with the following operation:
DELETE /customers/{customerId} - deletes a customer
We create another service called Orders with the following operations:
GET /orders/customers/{customerId} - gets all the orders for a customer
DELETE /orders/{orderId} - deletes an order
We build a UX screen for deleting a customer. The UX first calls the orders service to get all the orders for the customer. Then it iterates over the list of orders, calling the orders service to delete the order. Then it calls the customers service to delete the user.
This example is very simplistic, but as you can see, there is no option but to orchestrate the "Delete Customer" operation from the caller, which in this case is the user interface. Of course, what would be a single atomic transaction in a database does not translate to multiple HTTP/s calls, so it is possible that some of the calls may not succeed, leaving the system as a whole in an inconsistent state. In this instance the inconsistency would need to be resolved via some recovery mechanism.

In a microservice architecture, we have both the option, either use database per service or a shared database. There are advantages and disadvantages to both the pattern. Database per service architecture is the best practice but when the monolithic application has lots of function, procedure or database-specific feature on database level then we can use the Shared database approach, I know this is not the best practice if you have time and bandwidth then you should go for database per service.
As your concern is cascading over individual databases, you need to remove cascading from the database and implement global transaction handling in your application and execute all cascading related queries from that transaction.

I'm accessing a mongoDb database using the repository patterns. Where should I check for data Integrity?

I'm kind of new to mongodb and NoSQL data design in general.
I'm building a mongodb database that will have some denormalized data. For exemple, my "User" documents contains a reference (just the id) to zero or more "Article" documents and my Article documents contains references to zero or more users.
Since I'm using the repository pattern, no parts of my Data Access Layer knows about Articles AND Users. Where in my code should I check to make sure that all my documents are consistent with each others? Should I simply let the DAL's users code do the checks?
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?

Here is Microsoft's write-up on the Repository Pattern. From that document:
Use a repository to separate the logic that retrieves the data and maps it to the entity model from the business logic that acts on the model.
You have a couple of questions:
Where in my code should I check to make sure that all my documents are consistent with each others?
Based on the statement above, I think it's clear that this logic belongs in the Repository. The relation between these objects only exists at the layer of "business logic", the database cannot enforce these types of rules.
Should I simply let the DAL's users code do the checks?
How could they? As the writer of the repository, you are the DAL user. For MongoDB, the DAL is basically the driver.
You could possibly write a wrapper around the driver that would wrap the multiple writes in some form of transactions. But you would have to write this, MongoDB has no notion of transactions.
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?
At the end of the day, whoever writes the repository is going to be responsible for the integrity of the data. Such a script might be useful, but it would definitely suck a lot of CPU cycles.
My suggestion for N:M mappings is to start building some basic blocks for handling the multiple writes that are required to keep these two in sync. One idea is to Queue the changes and let a background job make the updates. This way you don't have to worry about multiple writes and roll-backs causing bad data.

What are the major challenges of building an iPhone application that synchronizes data with a server via web APIs?

I want to build an application that utilizes the data from a server, and it needs to synchronize the data in the application with the data entered by other client applications.
So, there are some questions:
How to design the database schema efficiently? Should it replicate the same database schema on the server or should it add some more fields & entities?
What are the strategies to synchronize the data, on each application start or during some idle state of the application, or something else...
How to handle conflict of the data entered by the user within the application and data enter ed by another client application.
Any response is welcomed.

Well, you've identified the main challenges in your original question. The real answer is that this has little to do with the iPhone - database replication is just really hard.
Here are some rules of thumb I can offer:
one-way replication of data is a million times easier than two-way replication, if you can get away with it.
replication is always easier if the database schema is identical on the client and the server.
to do two-way replication, you either need to store timestamps for each row on each end, or to store the complete contents of one end on the other end. (ie. the server needs to know the client's most recent status, or the client needs to know the server's most recent status).
to allow adding rows from disconnected clients, you need to identify your rows using a GUID (or hash, eg. SHA-1), not an autoincrement field. It's possible to keep new client-added rows as "identifierless" until you sync them with the server, but that way lies madness.
there is no actual good way to do conflict resolution. The imperfect options include last-writer-wins (last person who syncs a modified record gets their copy of the record inserted), three-way-merge (when someone sends a modified record, check which columns they have changed, and change only those columns, thus not overwriting any changes to other columns), split-into-two-records (if two people make changes to the same record, just make two records and assume someone will fix it eventually), and "ask the user" (which is technically the most sound, but requires a lot of UI work and users rarely understand what a conflict even is).

what happens to my dataset in case of unexpected failure

i know this has been asked here. But my question is slightly different. When the dataset was designed keeping the disconnected principle in mind, what was provided as a feature which would handle unexpected termination of the application, say a power failure or a windows hang or system exception leading to restart. Say the user has entered some 100 rows and it is modified at the dataset alone. Usually the dataset is updated at the application close or at a timely period.
In old times which programming using vb 6.0 all interaction used to take place directly with the database, thus each successful transaction was committing itself automatically. How can that be done using datasets?

DataSets are never for direct access to database, they are a disconnected model only. There is no intent that they be able to recover from machine failures.
If you want to work live against the database you need to use DataReaders and issue DbCommands against the database live for changes. This of course will increase your load on the database server though.
You have to balance the two for most applications. If you know a user just entered vital data as a new row, execute an insert command to the database, and put a copy in your local cached DataSet. Then your local queries can run against the disconnected data, and inserts are stored immediately.

A DataSet can be serialized very easily, so you could implement your own regular backup to disk by using serialization of the DataSet to the filesystem. This will give you some protection, but you will have to write your own code to check for any data that your application may have saved to disk previously and so on...
You could also ignore DataSets and use SqlDataReaders and SqlCommands for the same sort of 'direct access to the database' you are describing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse