MongoDB load balancing

MongoDB load balancing - mongodb

I was looking at best load balancing option for concurrent users with Mongo DB. I have looked at Master Slave replication but don't think this will load balance. Are there any open source DB load balancers for Mongo DB?
I have looked at Sequoia but looks like that project is no longer actively supported.
Please note: The data is not very huge & also not use case for sharding.

both Master Slave and Replica Sets will load balance in MongoDB, if you set slaveOK in your driver.
When slaveOK is enabled MongoDB drivers direct all reads to secondaries/slaves.
This provides relatively effective read balancing; for write balancing your only option.would be sharding.

Related

Mongo DB - difference between standalone & 1-node replica set

I needed to use Mongo DB transactions, and recently I understood that transactions don't work for Mongo standalone mode, but only for replica sets
(Mongo DB with C# - document added regardless of transaction).
Also, I read that standalone mode is not recommended for production.
So I found out that simply defining a replica set name in the mongod.cfg is enough to run Mongo DB as a replica set instead of standalone.
After changing that, Mongo transactions started working.
However, it feels a bit strange using it as replica-set although I'm not really using the replication functionality, and I want to make sure I'm using a valid configuration.
So my questions are:
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality? (as said I need it to allow transactions)
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
I've read that standalone mode is not recommended for production, although it sounds like it's the most basic configuration. I understand that this configuration is not used in most scenarios, but sometimes you may want to use it as a standard DB on a local machine. So why is standalone mode not recommended? Is it not stable enough, or other reasons?

Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality?
You don't have high availability afforded by a proper replica set. Thus it's not recommended for a production deployment. This is fine for development though.
Note that a replica set's function is primarily about high availability instead of scaling.
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
A single-node replica set would have the oplog. This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification).
So why is standalone mode not recommended? Is it not stable enough, or other reasons?
MongoDB in production was designed with a replica set deployment in mind, for:
High availability in the face of node failures
Rolling maintenance/upgrades with no downtime
Possibility to scale-out reads
Possibility to have a replica of data in a special-purpose node that is not part of the high availability nodes
In short, MongoDB was designed to be a fault-tolerant distributed database (scales horizontally) instead of the typical SQL monolithic database (scales vertically). The idea is, if you lose one node of your replica set, the others will immediately take over. Most of the time your application don't even know there's a failure in the database side. In contrast, a failure in a monolithic database server would immediately disrupt your application.

I think kevinadi answered well, but I still want to add it.
A standalone is an instance of mongod that runs on a single server but is not part of a replica set. Standalone instances used for testing and development, but always recomended to use replica sets in production.
A single-node replica set would have the oplog which records all changes to its data sets . This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification). It also supports point in time recovery.
Please follow Convert a Standalone to a Replica Set if you would like to convert the standalone database to replicaset.
Transactions have been introduced in MongoDB version 4.0. Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets. The transaction is not available in standalone because it requires oplog to maintain strong consistency within a cluster.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.

I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.

I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

Deploying large data on mongodb replicaset

Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?

Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.

Load Balancing Between Mongos

I have created a sharded environment, I am using two mongos. Is their a way I can load balance between the two "mongos",Because presently I found the Mongo client uses one of the two.Or do I have to write my own load balancer?

The recommendation would be having a mongos per application server and not implementing your own load balancer.
A query may not return the whole result in one batch, in which case the mongos will store some information associated with the cursor. If subsequent requests to iterate using the cursor are not redirected to the same mongos, then you will get errors. A load balancer would need to understand the MongoDB binary wire protocol to guarantee that scenario is handled properly.
See:
http://craiggwilson.com/2013/10/21/load-balanced-mongos/

We had the same question. Looks like Java Mongo Client can do the failover connection by itself.
Please see the answers for this question.
MongoDB load balancing and failover of query routers

mongodb automatic failover / high availability on aws

I need the proper way of failover mechanism for mongodb on aws ec2. I know failover can be accomplished by replica sets, but what is the best way to fire a new mongo installed ubuntu-ec2 ami node and add it to replica set again automatically (with zero manual operation) and return the replica set to it's proper state ?
EBS has some problems, but if I use local instance storage, I will lost the dead nodes data, but does the replica got all the master data and so is replaca is enough to recover everthing (on mongo 1.8 with journaling), or do I have to use only EBS ?
How should I start mongo instances, If I should start with repair option, how can I sperate node's first run from failover restart ?
Regards,

The easiest way to bring up new nodes is to bring up a new node with a recent backup.
So now it's a question of how you do your backup and how you restore from the backup quickly.
The MongoDB site has a write up for backups (in general) and backups on EC2 specifically. There's also a write-up for adding a new set member.
You can do this with instance storage or EBS drives, but you'll need different strategies for each. There's really no single way to do this, so I would check out the docs I've linked to for a primer.

Highly recommend reading Sean Coates' article on mutli-node MongoDB Elections, failover and AWS - specifically, the subtlety on distributed arbiter nodes (e.g., make sure to give yourself a voting majority when an AZ goes down). A similar recommendation can be found in a comment on this (now-closed) MongoDB vs. Cassandra thread.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse