How Can I Provide Consistency Between Different Databases? - mongodb

I want to build an application. The application will have three microservice like content-service1, content-service2, content-service3. Also, each microservice will have its own database. And the application will have a load balancer, Load balancer's mission is distribution (the first request will go to the first container, the second request will go to the second container...). The question is coming... How can ı provide consistency between different databases? I looked at some topics like partitioning, eventual consistency, saga... But I don't understand. Are these the right solution?
system design that ı want image

It looks like you are mixing up microservices and application instances.
If content-service1, content-service2 and content-service3 is backed by the same application (same code) you only have one application (one service).
If you want to have high availability you surely need multiple instances of the same application (simply the same application run 3 times on different servers).
In this case, you don't need to have a database per application instances because all of them will be connected to the same database and you won't have inconsistency issues.
If you don't want to have a single point of failure you should also need to ensure you can have redundancy for your database. Depending on the provider you use, you might have master/slaves or multi-master topology, data replication between database nodes will probably be handled by the database itself.
Microservice is a solution to split big applications in smaller ones. Each microservice will store it's own data in a dedicated database. Microservices are application like others, they can themselves have multiple instances for high availability.

Related

Multiple clusters vs one cluster in MongoDB Atlas

I am having multiple web apps that use MongoDB Atlas as their database.
In Atlas, you can create Clusters that hold multiple databases.
For every web app, I usually need one database. However, I am not sure if I should create one cluster for every web app or only one cluster in total holding one database for every web app. Is there a better choice?
If I see right, then MongoDB's business model is to limit the free clusters capacities, which means that it would be better to create a free cluster for every web app, since otherwise the capacity of one cluster is consumed very quickly.
If I see right, then MongoDB's business model is to limit the free clusters capacities, which means that it would be better to create a free cluster for every web app
If this is correct (which seems to me like it is) then creating separate clusters per application is a good idea.
Once you are paying for your databases, it may be cheaper to put multiple databases in the same cluster (since you'll have less overhead per database).
A reason to use separate clusters per application when you are paying for databases is additional security/resilience to accidental database wipes.

Multi region high availability on GKE - what to do with the PostgreSQL database?

Google has ]this cool tool kubemci - Command line tool to configure L7 load balancers using multiple kubernetes clusters with which you can basically have a HA multi region Kubernetes setup. Which is kind of cool.
But let's say we have an basic architecture like this:
Front end is implemented as SPA and uses json API to talk to backend
Backend is a set of microservices which use PostgreSQL as a DB storage engine.
So I can create two Kubernetes Clusters on GKE, put both backend and frontend on them (e.g. let's say in London and Belgium) and all looks fine.
Until we think about the database. PostgreSQL is single master only, so it must be placed in one of the regions only. And If backend from London region starts to talk to PostgreSQL in Belgium region the performance will really be poor considering the 6ms+ latency between those regions.
So that whole HA setup kind of doesn't make any sense? Or am I missing something? One option to slightly mitigate the issue is would be have a readonly replica in the the "slave" region, and direct read-only queries there (is that even possible with PostgreSQL?)
This is a classic architecture scenario that has no easy solution. Making data available in multiple regions is a challenging problem that major companies spend a lot of time and money to solve.
PostgreSQL does not natively support multi-master writes. Your idea of a replica located in the other region with logic in your app to read and write to the correct database would work. This will give you fast local reads, but slower writes in one region. It's also more complicated code in you app and more work to handle failover of the master. Bandwidth and costs can also be problems with heavy updates.
Use 3rd-party solutions for multi-master Postgres (like Postgres-BDR by 2nd Quadrant) to offload the work to the database layer. This can get expensive and your application still has to manage data conflicts from two regions overwriting the same data at the same time.
Choose another database that supports multi-regional replication with multi-master writes. Cassandra (or ScyllaDB) is a good choice, or hosted options like Google Spanner, Azure CosmosDB, AWS DynamoDB Global Tables, and others. An interesting option is CockroachDB which supports the PostgreSQL protocol but is a scalable relational database and supports multiple regions.
If none of these options work, you'll have to create your own replication system. Some companies do this with a event-sourced / CQRS architecture where every write is a message sent to a central log, then applied in every location. This is a more work but provides the most flexibility. At this point you're also basically building your own database replication system.
If you have multi cluster ingress set up on two clusters in different regions, then the multi cluster ingress will only send traffic to the closest region to the user.
If the closest region is down, this is when traffic will be routed to the cluster in the other region.
So using the example you have provided, if there is traffic being sent to the backend and this user is closer to London, then traffic sent by this user will always be sent to London as long as the Region is up and running.
In regards dealing with latency, you will have to deal with the latency in this case as you cannot create a read replica within another region.
The benefit of this functionality (multi-cluster ingress) is that if one region goes down, then you have another region to route the traffic to.

Sharing Reliable Collections across partitions

Is it possible to share data across Service Fabric partitions using Reliable Collections?
What would be the best approach to run arbitrary number of instances of a CPU/network -bound service that needs to share a small amount of data to be used for custom partitioning algorithm?
Reliable Collections themselves don't share state across partitions, no. But there are a couple ways you can share data depending on the nature of that data:
If the data you need to share is "dynamic" meaning it can change at runtime (e.g., due to user input), then you'd need to encapsulate that data in a separate service of its own, and provide an API for other services to access it. This would be accessible by any other service or application.
If the data you need to share is "static" meaning it doesn't change at runtime, then you can include it in the service as a data package or config package. These packages can be updated individually and separately from the service code without stopping or restarting the service. The same data/config package is available to all partitions of a service, but it is not directly accessible to other services or applications.

How to store shared-by-same-instances data in spring microservices architecture

following situation: I am building a system that requires redundant microservices for failover or loadbalancing. So I am starting two (or more instances of a service) of for example a simple core rest service that provides data.
My Question is: How would you store the data? Using two JPA-instances to access the same database (both writing and reading) will result in problems, especially in layer 2 caching and in consistency. Since the database must be redundent itself (requirement) it might be possible to make each service instance accessing its own database, but how would you synchronize them? Is there any common solution for this?
Thanks in advance!
If you truly need a multi-master consistent database, then you will almost definitely need to implement this at the database layer.
I would not cache things that are transactionally sensitive. If you truly need to do this, and cannot specify a reasonable TTL in which content can be stale, then you will need to set up a pub/sub sort of mechanism to expire modified entities. A lot of this really depends on your data, how often it changes, can you separate cacheable vs non-cacheable data? These questions strongly influence your caching decisions.
If you don't want to re-invent master-master replication (which will be highly non-trivial), I suggest you choose a DB system that supports this out of the box.
This will not solve all your problems out of the box, but at least it solves the hard part of the problem. What you still will need to do is e.g. defining and implementing a conflict resolution strategy.
A good choice for a master-master DB system is CouchDB. It is open source and there are also service providers available, in case you don't want to host the DB by yourself. I'm sure there are other DB systems that provide master-master replication as well.
There are two completely separate layer in your case.
One for application servers and another one for database.
If you really need a scalable system -I think you need because you are mentioning about load balancing- then you should remove all the states out from you application .
For example you should not use layer 2 caching in your application instance instead you should use some external service like redis or memcache.
And you should use just one master database instance for writes and another replicate waiting for failover. To do that we are using Amazon RDS MultiAZ instances. There is just one master database which is replicated to another instance. In case of crash or something the second database is automatically set as master in a couple of seconds.

MongoDB connection overhead on client side

We are evaluating different alternatives for multi-tenancy in our platform. We think that one database per customer is the way to go as data structure and requirements are completely different from one customer to another, and we want to keep them as isolated as possible.
However we are facing the question of how to manage the connection to multiple databases. We don't want to have one app instance per customer. Instead we want to have a pool of app instances handling requests for all our customers and use the correct database depending on the customer.
Our concern is if keeping connections open to many (maybe thousands) of database will cause a performance issue. We are actually worried about memory usage, so we are wondering what's the overhead on client side when performing a connection to the MongoDB server.
Also we are thinking about moving the database access to a different service, which is going to be responsible of handling the database connection for all customers. In this case, is there an existing tool that allows to do that kind of "multiplexing" of MongoDB databases?
Some additional notes:
We discarded sharding. It won't fit our needs. We need different databases.
Databases will be in different servers with reserved resources. This means all databases run its own mondod process and we need different connections.
We use Java driver.