MongoDB nodes (AWS EC2 Instances) are still responsive even after network partitioning done using Security Groups - mongodb

I have created a MongoDB replica set using 5 EC2 instances on AWS. I added the nodes using rs.add("[IP_Address]") command.
I want to perform network partition in the replica set. In order to that, I have specified 2 kinds of security groups. 'SG1' has 27017 port (MongoDB port) opened. 'SG2' doesn't expose 27017.
I want to isolate 2 nodes from the replica set. When I apply SG2 on these 2 nodes (EC2 instances), ideally they should stop getting write and read from the primary as I am blocking the 27017 port using security group SG2. But in my case, they are still writable. Data written on Primary reflects on the partitioned node. Can someone help? TYA.

Most firewalls, including AWS Security groups, will block incoming connections when the connection is being opened. Changing settings will affect all new connection, but existing open connections are not re-evaluated when they are applied.
MongoDB maintains connections between hosts and that would only get blocked after loss of connection between the hosts.
On Linux you can restart the networking which will reset the connections. You can do this after applying the new rules by running:
/etc/init.d/networking stop && /etc/init.d/networking start

Related

using mongoclient with multi mongos to connect to a sharded replica mongo cluster

In a regular sharded replica cluster, it consists of 10 mongos, 5 config servers and 10 shards. I use mongo client to connect to multiple mongos instances.
I have two questions.
The first question: What is load policy in this situation? Is it round-robin scheduing?
The second one: What if one onf the mongos instances is down, what is the move that mongoclient would take? Will it still connect to this mongos instance or drop this one from the list.
Please help with these.thanks
The mongos servers provide a routing service to direct read/write queries to the appropriate shard(s).
You are specifying multiple mongos's to connect to the MongoDB sharded cluster. An available mongos will be used to connect to the server.
The first question: What is load policy in this situation? Is it
round-robin scheduling?
The client will connect to the server with an available mongos. There is no "load policy" and there is no round-robin scheduling. You use multiple mongos's for high availability.
See: Number of mongos and Distribution
The second one: What if one of the mongos instances is down, what is
the move that mongoclient would take? Will it still connect to this
mongos instance or drop this one from the list.
If a mongos is down, the client will connect to the server using another available mongos from the list (you have more than one mongos to connect with).

How to shift internal communication of nodes in a MongoDB cluster to another network to decrease the load of main network

I have created a 8 node MongoDB cluster with 2 shards + 2 Replica(1 for each shard) + 3 Config Servers + 1 Mongos.
All these are on network 192.168.1.(eth0) with application server. So this network is handling all the traffic.
So I have created one another network 192.168.10.(eth1) which is having only these 8 MongoDB nodes.
Now all the eight nodes are the part of both the networks with dual IP's.
Now I want to shift the internal traffic between these mongodb nodes to network 192.168.10.(eth1) to reduce the load from main network 192.168.1.(eth0)
So how to bind the ports/nodes for the purpose?
You can use bind_ip as a startup or configuration option. Keep in mind that various nodes need to be accessible in the event of failover.
Notably here is your single mongos where it would be advised to either co-locate the service per app server, or depending on requirements, have a pool available to your driver connection. Preferably both and having a large instance for each 'mongos' where aggregate operations are used.
I got the solution of the problem I was looking for. I configured the cluster according to the IP's of network 192.168.11._
Now the internal data traffic is going through this network.

How do mongos instances work together in a cluster?

I'm trying to figure out how different instances of mongos server work together.
If I have 1 configserver and some shards, for example four, each of them composed by only one node (a master of course), and have four mongos server... do the mongos server communicate between them? Is it possible that one mongos redirect its load to another mongos?
When you have multiple mongos instances, they do not automatically load-balance between each other. They don't even know about each others existence.
The MongoDB drivers for most programming languages allow to specify multiple mongos instances when creating a connection. In that case the driver will usually ping all of them and connect to the one with the lowest latency. This will usually be the one which is closest geographically. When all have the same network distance, the one which is least busy right now will usually respond first. The driver will then stay connected to that one mongos, unless the program explicitely reconnects or the mongos can no longer be reached (in that case the driver will usually automatically pick another one from the initial list).
That means using multiple mongos instances is normally only a valid method for scaling when you have a large number of low-load clients, not one high-load client. When you want your one high-load client to make use of many mongos instances, you need to implement this yourself by creating a separate connection to each mongos instance and implement your own mechanism to distribute queries among them.
Short answer
As of MongoDB 2.4, the mongos servers only provide a routing service to direct read/write queries to the appropriate shard(s). The mongos servers discover the configuration for your sharded cluster via the config servers. You can find out more details in the MongoDB documentation: Sharded Cluster Query Routing.
Longer scoop
I'm trying to figure out how different instances of mongos server work togheter.
The mongos servers do not currently talk directly to each other. They do coordinate activity some activity via your config servers:
reading the sharded cluster metadata
initiating a balancing round (any mongos can start a balancing round, but only one round can be active at a time)
If I have 1 configserver
You should always have 3 config servers in production. If you somehow lose or corrupt your config server, you will have to combine your data and re-shard your database(s). The sharded cluster metadata saved on the config servers is the definitive source for what sharded data ranges should live on each shard.
some shards, for example four, each of them composed by only one node (a master of course)
Ideally each shard should be backed by a replica set if you want optimal uptime. Replica sets provide for auto-failover and can be very useful for administrative purposes (for example, taking backups or adding indexes offline).
Is it possible that one mongos redirect its load to another mongos?
No, the mongos do not perform any load balancing. The typical recommendation is to deploy one mongos per app server.
From an application/driver point of view you can specify multiple mongos in your connect string for failover purposes. The application drivers will generally connect to the nearest available mongos (by network ping time), and attempt to reconnect to in the event the current mongos connection fails.

How to add new server in replica set in production

I am new to mongodb replica set.
According to Replic Set Ref this should be connection string in my application to connect to mongodb
mongodb://db1.example.net,db2.example.net,db3.example.net:2500/?replicaSet=test
Suppose this is production replica set (i.e. I cannot change application code or stop all the mongo servers) And, I want to add another mongo db instance db4.example.net in test replica set. How will I do that?
How my application will know about the new db4.example.net
If you are looking for real-world scenario:
In situation when any of existing server is down due to hardware failure etc, it is natural to add another db server to the replica set to preserve the redundancy. But, how to do that.
The list of replica set hosts in your connection string is a "seed list", and does not have to include all of the members of your replica set.
The MongoDB client driver used by your application will iterate through the seed list until it can successfully connect to a host, and use that host to request the current replica set configuration which will list all current members of the replica set. Per the documentation, it is recommended to include at least two hosts in the connect string so that your driver can still connect in the event the first host happens to be down.
Any changes in replica set configuration (i.e. adding/removing members or election of a new primary) are automatically discovered by your client driver so you should not have to make any changes in the application configuration to add a new member to your replica set.
A change in replica set configuration may trigger an election for a new primary, so your application code should expect to handle transient errors for a few seconds during reconfiguration.
Some helpful mongo shell commands:
rs.conf() - display the current replication configuration
db.isMaster().primary - display the current primary
You should notice a version number in the configuration document returned by rs.conf(). This version is incremented on every configuration change so drivers and replica set nodes can check if they have a stale version of the config.
How my application will know about the new db4.example.net
Just rs.add("db4.example.net") and your application should discover this host automatically.
In your scenario, if you are replacing an entirely dead host you would likely also want to rs.remove() the original host (after adding the replacement) to maintain the voting majority for your replica set.
Alternatively, rather than adding a host with a new name you could replace the dead host with a new server using the same hostname as previously configured. For example, if db3.example.net died, you could replace it with a new db3.example.net and follow the steps to Resync a replica set member.
A way to provide abstraction to your database is to set up a sharded cluster. In that case, the access point between your application and the database are the mongodb routers. What happens behind them is outside of the visibility of the application. You can add shards, remove shards, turn shards into replica-sets and change those replica-sets all you want. The application keeps talking with the routers, and the routers know which servers they need to forward them. You can change the cluster configuration at runtime by connecting to the routers with the mongo shell.
When you have questions about how to set up and administrate MongoDB clusters, please ask on http://dba.stackexchange.com.
But note that in the scenario you described, that wouldn't even be necessary. When one of your database servers has a hardware failure and your system administrators want to replace it without application downtime, they can just assign the same IP and hostname to the new server so the application doesn't even notice that it's a replacement.
When you want to know details about how to do this, you will find help on http://serverfault.com

Does mongodb master node need to be accessible from clients?

In a MongoDB replica set, Does master node need to be accessible from clients? Or secondary nodes will redirect write queries to master node?
All your nodes must be accessible from clients. That way, if the primary goes down and a secondary is promoted to primary, your application will continue to work.
Secondary nodes will not proxy write requests to the primary node. To perform writes you need to be directly connected to the master node.
The above answers aren't 100% correct.
1) if you are in a sharded environment then the clients need to be able to communicate with the mongos process which then communicates with the PRIMARY nodes (and the config servers) there could be a scenario where the application servers and separated from the PRIMARY mongodb server in a replica set yet they where able to communicate with the mongos processes which was then able to communicate with the PRIMARY mongodb server.
2) Another user noted that "all your nodes must be accessible from clients" while generally true not always true, in a situation where you had a delayed secondary in a separate data center only members of the replica set need to be able to communicate with the delayed secondary; however the application servers never need to communicate with it.