MongoDB connection overhead on client side - mongodb

We are evaluating different alternatives for multi-tenancy in our platform. We think that one database per customer is the way to go as data structure and requirements are completely different from one customer to another, and we want to keep them as isolated as possible.
However we are facing the question of how to manage the connection to multiple databases. We don't want to have one app instance per customer. Instead we want to have a pool of app instances handling requests for all our customers and use the correct database depending on the customer.
Our concern is if keeping connections open to many (maybe thousands) of database will cause a performance issue. We are actually worried about memory usage, so we are wondering what's the overhead on client side when performing a connection to the MongoDB server.
Also we are thinking about moving the database access to a different service, which is going to be responsible of handling the database connection for all customers. In this case, is there an existing tool that allows to do that kind of "multiplexing" of MongoDB databases?
Some additional notes:
We discarded sharding. It won't fit our needs. We need different databases.
Databases will be in different servers with reserved resources. This means all databases run its own mondod process and we need different connections.
We use Java driver.

Related

What are the differences (CPU, runtime, or otherwise) between using pg pool and pgp as a database connection in an express server

I've created a few apps which utilize a postgres database, but in all of those projects, I've either used the pool or client function from the pg npm package. Recently I came across the pg-promise node package, and was just wondering if there were any drawbacks to using pg-promise over pool or client. I'm just worried about changes in runtime that would affect how many clients the app could service at one time.
pg-promise is "Built on top of node-postgres". You're still using the same pools and clients.
Nothing changes regarding the amount of connections your database will be able to handle, and unless you use a different approach to building your application (like, using transactions instead of not using transactions, or using individual clients instead of pooling), nothing will change regarding the amount of clients your app will be able to serve.

How Can I Provide Consistency Between Different Databases?

I want to build an application. The application will have three microservice like content-service1, content-service2, content-service3. Also, each microservice will have its own database. And the application will have a load balancer, Load balancer's mission is distribution (the first request will go to the first container, the second request will go to the second container...). The question is coming... How can ı provide consistency between different databases? I looked at some topics like partitioning, eventual consistency, saga... But I don't understand. Are these the right solution?
system design that ı want image
It looks like you are mixing up microservices and application instances.
If content-service1, content-service2 and content-service3 is backed by the same application (same code) you only have one application (one service).
If you want to have high availability you surely need multiple instances of the same application (simply the same application run 3 times on different servers).
In this case, you don't need to have a database per application instances because all of them will be connected to the same database and you won't have inconsistency issues.
If you don't want to have a single point of failure you should also need to ensure you can have redundancy for your database. Depending on the provider you use, you might have master/slaves or multi-master topology, data replication between database nodes will probably be handled by the database itself.
Microservice is a solution to split big applications in smaller ones. Each microservice will store it's own data in a dedicated database. Microservices are application like others, they can themselves have multiple instances for high availability.

Is there a way to Rate Limit or Throttle a user or a connection in PostgreSql?

We have a setup wherein a Database instance is shared between multiple users.
We are trying to implement some form or throttling or Rate limiting for a shared PostgreSQL so that one user may not starve other users from consuming all the resources.
One approach that we can think of is adding connections pools and fixing the number of connections that we give each tenant.
But one user can still starve all the resource over a few connections. Is there a way to throttle resource usage per connection or per user in PostgreSQL?
No, the postgres documentation makes it clear that's not possible using Postgres alone.
It's usually a (very) bad sign if your application allows one user to starve resources from others - it suggests you've got a bottleneck in your application, and that bottleneck will appear when you least want it to.

How to store shared-by-same-instances data in spring microservices architecture

following situation: I am building a system that requires redundant microservices for failover or loadbalancing. So I am starting two (or more instances of a service) of for example a simple core rest service that provides data.
My Question is: How would you store the data? Using two JPA-instances to access the same database (both writing and reading) will result in problems, especially in layer 2 caching and in consistency. Since the database must be redundent itself (requirement) it might be possible to make each service instance accessing its own database, but how would you synchronize them? Is there any common solution for this?
Thanks in advance!
If you truly need a multi-master consistent database, then you will almost definitely need to implement this at the database layer.
I would not cache things that are transactionally sensitive. If you truly need to do this, and cannot specify a reasonable TTL in which content can be stale, then you will need to set up a pub/sub sort of mechanism to expire modified entities. A lot of this really depends on your data, how often it changes, can you separate cacheable vs non-cacheable data? These questions strongly influence your caching decisions.
If you don't want to re-invent master-master replication (which will be highly non-trivial), I suggest you choose a DB system that supports this out of the box.
This will not solve all your problems out of the box, but at least it solves the hard part of the problem. What you still will need to do is e.g. defining and implementing a conflict resolution strategy.
A good choice for a master-master DB system is CouchDB. It is open source and there are also service providers available, in case you don't want to host the DB by yourself. I'm sure there are other DB systems that provide master-master replication as well.
There are two completely separate layer in your case.
One for application servers and another one for database.
If you really need a scalable system -I think you need because you are mentioning about load balancing- then you should remove all the states out from you application .
For example you should not use layer 2 caching in your application instance instead you should use some external service like redis or memcache.
And you should use just one master database instance for writes and another replicate waiting for failover. To do that we are using Amazon RDS MultiAZ instances. There is just one master database which is replicated to another instance. In case of crash or something the second database is automatically set as master in a couple of seconds.

How to implement real-time replication of MongoDB (or CouchDB) to many remote clients

I'm considering how to design a mechanism for replicating a (potentially large) MongoDB or other NoSQL (CouchDB, etc) database to dozens of clients at once. The clients would function like a replica set, but the replication would be one-way and the remote clients would belong to other parties. Specifically, I am looking for the following features:
real-time: changes to the master database should be pushed out to the clients as quickly as possible
replication to new clients: a new client must be able to connect, automatically sync the majority of existing data, then receive real-time updates.
efficient: both the initial synchronization/transfer of data and tracking of real-time updates ("diffs", if you will) are computationally efficient, with multiple clients connected.
secure: the master database presents an interface to which remote clients (who do not belong to the same owner or system) can connect: i.e., we cannot just add all the clients to the master's replica set.
robust: a temporarily connection failure between a client and the master database should be easily and efficiently recoverable.
In some sense, the server is publishing a collection of data and the clients are subscribing to it. I realize that this is a hard software engineering problem, and to my knowledge no piece of software has implemented this exactly yet. However, some approaches have come to mind as close, which I'll list below.
Meteor's DDP protocol: It's designed to do this with Mongo-like collections and exactly implements the model of publishing and subscribing to a set of data (rather than a stream of messages). It manages the initial sync and sends along live changes. However, it's still in development, and far from being an industrial-strength solutions - current drawbacks are that the server keeps a copy of every client's state in a possibly inefficient way and is only tested on collections that can fit in the memory of a web app. Also, it appears that DDP cannot efficiently sync an out-of-date database without fetching everything from scratch. If anyone can point to some examples of how large of a collection can be synced over DDP, that would be great. (See also: https://stackoverflow.com/q/10128430/586086)
Broadcasting the Mongo oplog: Using a high-throughput message bus like Apache Kafka, one may be able to efficiently send the oplog to many clients at once. This tackles some of the system implementation challenges. However, this requires that the clients start with an initial sync that gets them close enough to the current master state somehow and then start replaying the oplog from the appropriate point.
Continuous replication a la CouchDB: I'm not sure how this is implemented and how robust it is, given the sparsity of the documentation. However, it does seem to work over remote database connections. How efficient is this, though, when multiple clients are trying to replicate at the same time? (A similar hack to this would be to make the clients MongoDB Priority 0 replica set members; however, that seems to be far from its intended use. See also: http://guide.couchdb.org/draft/replication.html)
Please give pointers to software or pieces of software that already implement parts of this, or suggestions on the algorithms/data structures needed to do this efficiently.
If you are looking specifically for real-time replication, I'd recommend you look into SaaS offerings specifically for this purpose, such as https://www.firebase.com/