In a MongoDB replica set, Does master node need to be accessible from clients? Or secondary nodes will redirect write queries to master node?
All your nodes must be accessible from clients. That way, if the primary goes down and a secondary is promoted to primary, your application will continue to work.
Secondary nodes will not proxy write requests to the primary node. To perform writes you need to be directly connected to the master node.
The above answers aren't 100% correct.
1) if you are in a sharded environment then the clients need to be able to communicate with the mongos process which then communicates with the PRIMARY nodes (and the config servers) there could be a scenario where the application servers and separated from the PRIMARY mongodb server in a replica set yet they where able to communicate with the mongos processes which was then able to communicate with the PRIMARY mongodb server.
2) Another user noted that "all your nodes must be accessible from clients" while generally true not always true, in a situation where you had a delayed secondary in a separate data center only members of the replica set need to be able to communicate with the delayed secondary; however the application servers never need to communicate with it.
Related
I have some questions about the mongo replica
mongo replica
If I make 1 primary and 2 secondary MongoDB for replication. So I have 3 endpoints to 3 different DB and my apps can only write on primary DB. what if suddenly my primary shutdown and secondary DB take over the primary. Then how to automatically change the endpoint in my apps? should I use mongos (mongo routes)? but it needs sharding if I remember correctly.
Thank you.
All nodes in a replica set work together to have identical data. Secondary nodes may lag behind the primary, but you don't get "3 different DB". There is only one database of which copies exist on each node.
All MongoDB drivers know to monitor replica set members and discover which is the primary one automatically. You need to configure some drivers to do so by providing the replica set name, others do it automatically by default when they connect to a replica set node. Look up "connecting to replica set" in your driver documentation.
In a proper connection string you will provide all three RS members, e.g.
mongodb://mongodb0.example.com:27017,mongodb1.example.com:27017,mongodb2.example.com:27017/?replicaSet=myRepl
The client will detect the PRIMARY and will use it. I guess, most drivers will re-connect automatically if the PRIMARY node changes.
Most drivers will detect the PRIMARY automatically if you provide the ReplicaSet name, i.e.
mongodb://mongodb0.example.com:27017/?replicaSet=myRepl
would connect to the PRIMARY even if it is not mongodb0.example.com. However, if mongodb0.example.com does not run, then you don't connect at all. So, it is beneficial to provide all ReplicaSet members in the connection string.
See Connection String URI Format
mongos is needed only to connect to a Sharded Cluster.
I would like to have Mongo on a server, then to replicate this onto a laptop.
The laptop is needed to leave the network and still be able to read/write, and once back on the network sync these changes with the primary.
At the same time I need the VM (Primary), to still be accessible (read/write).
So when each device is not talking to each other then for them to make themselves primary.
I have set up a very basic replica, Primary on a VM and the Secondary on the machine running the VM. In all examples I have seen it recommends having 3 servers for the replica, but I only need 2!
A couple of questions:
Is this possible with Mongo? If not then any suggestions!
When I turn off the Network adapter on the VM(Primary), the secondary doesn't seem to want to become the primary.
Is it possible to run 2 instance of Mongo, and then use the other instance as the 3 member of the replica.
Any advice would be great, thanks.
MongoDB always needs even number servers. in your case it should be like one primary one secondary and one arbiter. arbiter server instance is responsible for electing new server as primary when current primary goes down.
I am aware that mongodb has a master-slave architecture.
Therefore, I was thinking that the master would be the single point of failure in mongoDB since it takes care of all the requests and sends it to the slave nodes. However, when the master fails, then a new master is reelected from the slaves. Therefore I need some clarification on where the single point of failure lies.
Does mongoDB have a single point of failure? Is it in the master node?
Thanks,
MongoDB can be set up in a way that there is no single point of failure (at least none specific to MongoDB).
When you set up replication as suggested (which includes primary, secondary and an arbiter on a 3rd server), the secondary will take the role of the primary when it goes down. Keep in mind that this only works when the applications know both the primary and the secondary (how to make it aware depends on the driver).
When you have a sharded cluster, the mongo router process (mongos) and the config servers becomes additional possible points of failure, but you can also set up reduntant routers and config servers. To send the clients to another mongos server when theirs goes down, you need a 3rd party load-balancing solution.
For a proper production MongoDB setup with clustering, MongoDB Inc. suggests:
At least 2 mongos routers
Exactly 3 config servers
3 servers per shard (primary, secondary and arbiter), where the arbiters do not necessarily need dedicated servers and can share hardware with the routers, config servers, members of a different replica-set or app servers.
I'm trying to figure out how different instances of mongos server work together.
If I have 1 configserver and some shards, for example four, each of them composed by only one node (a master of course), and have four mongos server... do the mongos server communicate between them? Is it possible that one mongos redirect its load to another mongos?
When you have multiple mongos instances, they do not automatically load-balance between each other. They don't even know about each others existence.
The MongoDB drivers for most programming languages allow to specify multiple mongos instances when creating a connection. In that case the driver will usually ping all of them and connect to the one with the lowest latency. This will usually be the one which is closest geographically. When all have the same network distance, the one which is least busy right now will usually respond first. The driver will then stay connected to that one mongos, unless the program explicitely reconnects or the mongos can no longer be reached (in that case the driver will usually automatically pick another one from the initial list).
That means using multiple mongos instances is normally only a valid method for scaling when you have a large number of low-load clients, not one high-load client. When you want your one high-load client to make use of many mongos instances, you need to implement this yourself by creating a separate connection to each mongos instance and implement your own mechanism to distribute queries among them.
Short answer
As of MongoDB 2.4, the mongos servers only provide a routing service to direct read/write queries to the appropriate shard(s). The mongos servers discover the configuration for your sharded cluster via the config servers. You can find out more details in the MongoDB documentation: Sharded Cluster Query Routing.
Longer scoop
I'm trying to figure out how different instances of mongos server work togheter.
The mongos servers do not currently talk directly to each other. They do coordinate activity some activity via your config servers:
reading the sharded cluster metadata
initiating a balancing round (any mongos can start a balancing round, but only one round can be active at a time)
If I have 1 configserver
You should always have 3 config servers in production. If you somehow lose or corrupt your config server, you will have to combine your data and re-shard your database(s). The sharded cluster metadata saved on the config servers is the definitive source for what sharded data ranges should live on each shard.
some shards, for example four, each of them composed by only one node (a master of course)
Ideally each shard should be backed by a replica set if you want optimal uptime. Replica sets provide for auto-failover and can be very useful for administrative purposes (for example, taking backups or adding indexes offline).
Is it possible that one mongos redirect its load to another mongos?
No, the mongos do not perform any load balancing. The typical recommendation is to deploy one mongos per app server.
From an application/driver point of view you can specify multiple mongos in your connect string for failover purposes. The application drivers will generally connect to the nearest available mongos (by network ping time), and attempt to reconnect to in the event the current mongos connection fails.
A MongoDB instance can have different roles:
Config server
Router (mongos)
Data server
Arbiter server (for replica sets)
I know that db.serverStatus() can be used to see if an instance is a router, the process value is mongos.
But for config servers, arbiters and data nodes the process value is mongod.
Is there a simple way of distinguishing between these instance types?
I want to bring attention to one particular important issue with this question: sharding is and horizontal dimension ( several replicasets where data is distributed to ) and replicaset is a high availability solution which is represented by the composition of different mongod nodes!
So you actually what you are trying to figure out is:
ReplicaSet nodes roles
Shard Nodes members
In the case of a replicaSet what you might be interested in knowing is each node role. You can easily get the information without needing to connect to all the nodes of the replicaset, just run the command:
db.isMaster()
with this you will get the node members and roles of each member.
For shard node members first of all you should never try to connect directly to the config servers. These are their to manage the distribution of chunks, chunk splits and other configuration data, relevant only for the shard cluster functionality. Avoid using those ip's to connect to from your application.
So if you want to have a clear view of which members compose your shard cluster, how many shards you have etc, you need to run command:
db.printShardStatus()
or
sh.status()
Please review the documentation here
Cheers,
N.