Hosting Replicaset and Config servers MongoDB on other servers? - mongodb

Should I keep the replicasets and config servers on separate servers? Or have one replicaset and one config server on one server? Can I have all replicasets on one server and all config servers on another one server? (Does this defeat the purpose of sharding?)

The purpose of sharding is distributing load on multiple servers. The purpose of replication is (mostly) redundancy by allowing one server to take the place of another when that server goes offline for some reason. Obviously, it does not make much sense in either case to run multiple instances on the same server. So yes, it would defeat the purpose of sharding.
However, when you only have two servers and have to choose between replication and sharding, you can get the best of both worlds by creating two shards where each shard has a secondary which runs on the server of the primary of the other shard. That way you have both the performance-improvement when everything is OK but don't lose access to half your data when one server goes down.
Regarding the config servers: MongoDB recommends to make them a separate replica-set which runs on separate servers. But when you are on a budget, it should technically be possible to put that replica-set on the same hardware which runs the actual database. The config servers are only required when a mongos process (re-)starts or when a chunk migration happens and are relatively idle the rest of the time. Unfortunately a chunk migration is also a phase where the involved shards are very busy, so running the config servers on the same hardware will make performance drops during chunk migrations even worse.

Related

MongoDB replication and failover with just two server instances

So I'm in the process of laying out an architecture for a system that I intend to build myself. One of the features of the system should be, that it comprise redundancy - So that server B can take over in the event that Server A fails.
The problem is, that I know MongoDB supports replication with failover - However, just not when you only have 2 MongoDB instances (Because a single MongoDB instance can't appoint itself as primary).
As I see it, I therefore have 2 options:
Have a small service, that listens to the MongoDB changestream on server A and synchronize with server B at every change event
Use replication and accept that failover is not supported (Write a failover script to manually appoint a primary based on my own rules)
I don't have much practical experience with MongoDB, thus why I would like to hear from you if:
My two solutions are feasible
What caveats I may run into

MongoDB sharding: mongos and configuration servers together?

We want to create a MongoDB shard (v. 2.4). The official documentation recommends to have 3 config servers.
However, the policies of our company won't allow us to get 3 extra servers for this purpose. Since we have already 3 application servers (1 web node, 2 process nodes) we are considering to put the configuration servers in the same application servers, with the mongos. Availability is not critical for us.
What do you think about this configuration? Can we face some problem or is it discouraged for some reason?
Given that Availability is not critical for your use case, I would say it should be fine to place the config servers in the same application servers and mongos.
If one of the process nodes is down, you will lose: 1 x mongos, 1 application server and 1 config server. During this down time, the other two config servers will be read-only , which means there won't be balancing of shards, modification to cluster config etc. Although your other two mongos should still be operational (CRUD wise). If your web-node is down, then you have a bigger problem to deal with.
If two of the nodes are down (2 process nodes, or 1 web server and process node), again, you would have bigger problem to deal with. i.e. Your applications are probably not going to work anyway.
Having said that, please consider the capacity of these nodes to be able to handle a mongos, an application server and a config server. i.e. CPU, RAM, network connections, etc.
I would recommend to test the deployment architecture in a development/staging cluster first under your typical workload and use case.
Also see Sharded Cluster High Availability for more info.
Lastly, I would recommend to check out MongoDB v3.2 which is the current stable release. The config servers in v3.2 are modelled as a replica set, see Sharded Cluster config servers for more info.

mongodb Single Point of Failure

I am aware that mongodb has a master-slave architecture.
Therefore, I was thinking that the master would be the single point of failure in mongoDB since it takes care of all the requests and sends it to the slave nodes. However, when the master fails, then a new master is reelected from the slaves. Therefore I need some clarification on where the single point of failure lies.
Does mongoDB have a single point of failure? Is it in the master node?
Thanks,
MongoDB can be set up in a way that there is no single point of failure (at least none specific to MongoDB).
When you set up replication as suggested (which includes primary, secondary and an arbiter on a 3rd server), the secondary will take the role of the primary when it goes down. Keep in mind that this only works when the applications know both the primary and the secondary (how to make it aware depends on the driver).
When you have a sharded cluster, the mongo router process (mongos) and the config servers becomes additional possible points of failure, but you can also set up reduntant routers and config servers. To send the clients to another mongos server when theirs goes down, you need a 3rd party load-balancing solution.
For a proper production MongoDB setup with clustering, MongoDB Inc. suggests:
At least 2 mongos routers
Exactly 3 config servers
3 servers per shard (primary, secondary and arbiter), where the arbiters do not necessarily need dedicated servers and can share hardware with the routers, config servers, members of a different replica-set or app servers.

Deploying large data on mongodb replicaset

Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?
Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.

How do mongos instances work together in a cluster?

I'm trying to figure out how different instances of mongos server work together.
If I have 1 configserver and some shards, for example four, each of them composed by only one node (a master of course), and have four mongos server... do the mongos server communicate between them? Is it possible that one mongos redirect its load to another mongos?
When you have multiple mongos instances, they do not automatically load-balance between each other. They don't even know about each others existence.
The MongoDB drivers for most programming languages allow to specify multiple mongos instances when creating a connection. In that case the driver will usually ping all of them and connect to the one with the lowest latency. This will usually be the one which is closest geographically. When all have the same network distance, the one which is least busy right now will usually respond first. The driver will then stay connected to that one mongos, unless the program explicitely reconnects or the mongos can no longer be reached (in that case the driver will usually automatically pick another one from the initial list).
That means using multiple mongos instances is normally only a valid method for scaling when you have a large number of low-load clients, not one high-load client. When you want your one high-load client to make use of many mongos instances, you need to implement this yourself by creating a separate connection to each mongos instance and implement your own mechanism to distribute queries among them.
Short answer
As of MongoDB 2.4, the mongos servers only provide a routing service to direct read/write queries to the appropriate shard(s). The mongos servers discover the configuration for your sharded cluster via the config servers. You can find out more details in the MongoDB documentation: Sharded Cluster Query Routing.
Longer scoop
I'm trying to figure out how different instances of mongos server work togheter.
The mongos servers do not currently talk directly to each other. They do coordinate activity some activity via your config servers:
reading the sharded cluster metadata
initiating a balancing round (any mongos can start a balancing round, but only one round can be active at a time)
If I have 1 configserver
You should always have 3 config servers in production. If you somehow lose or corrupt your config server, you will have to combine your data and re-shard your database(s). The sharded cluster metadata saved on the config servers is the definitive source for what sharded data ranges should live on each shard.
some shards, for example four, each of them composed by only one node (a master of course)
Ideally each shard should be backed by a replica set if you want optimal uptime. Replica sets provide for auto-failover and can be very useful for administrative purposes (for example, taking backups or adding indexes offline).
Is it possible that one mongos redirect its load to another mongos?
No, the mongos do not perform any load balancing. The typical recommendation is to deploy one mongos per app server.
From an application/driver point of view you can specify multiple mongos in your connect string for failover purposes. The application drivers will generally connect to the nearest available mongos (by network ping time), and attempt to reconnect to in the event the current mongos connection fails.