I need to set up a MongoDB cluster with two shards. Each shard contains 3 replica member: primary, secondary and arbiter. I already open firewall rules for mongos to talk to primary and secondary nodes but not the arbiters. I try to connect to mongos, do sh.addShard() and see that it is working properly.
My question then is do we really need to allow mongos to interact with arbiters as well?
From this link we know that mongos doesn't talk to hidden members, but nothing was mentionned about arbiters.
mongos needs to see all nodes including arbiters in order to provide transparent failover.
In normal circumtances when all 3 replica nodes (primary, secondary and arbiter) are up, mongos doesn't need to see arbiter. But when there are only one primary and one arbiter, mongos needs to check arbiter to make sure there is no network partition error in election process.
Arbiters are there just to elect the new primary depending upon the priorities set in mongod.conf. Arbiter themselves don't store any data but they do maintain heartbeat from all the mongod servers of a replica set, and arbiters are the first ones to know if any of mongod servers of a replica set is down.
Related
I need to use Kafka Connect to monitor changes to a MongoDB cluster with one primary and 2 replicas.
I see there is the official MongoDB connector, and I want to understand what would be the connector's behaviour, in case the primary replica would fail. Will it automatically read from one of the secondary replicas which will become the new primary? I couldn't find information for this in the official docs.
I've seen this post related to the tasks.max configuration, which I thought might be related to this scenario, but the answer implies that it always defaults to 1.
I've also looked at Debezium's implementation of the connector, which seems to support this scenario automatically:
The MongoDB connector is also quite tolerant of changes in membership
and leadership of the replica sets, of additions or removals of shards
within a sharded cluster, and network problems that might cause
communication failures. The connector always uses the replica set’s
primary node to stream changes, so when the replica set undergoes an
election and a different node becomes primary, the connector will
immediately stop streaming changes, connect to the new primary, and
start streaming changes using the new primary node.
Also, Debezium's version of the tasks.max configuration property states that:
The maximum number of tasks that should be created for this connector.
The MongoDB connector will attempt to use a separate task for each
replica set, [...] so that the work for each replica set can be
distributed by Kafka Connect.
The question is - can I get the same default behaviour with the default connector - as advertised for the Debezium one? Because of external reasons, I can't use the Debezium one for now.
In a PSS deployment:
If one node is not available, the other two nodes can elect a primary
If two nodes are not available, there can be no primary
The quote you referenced suggests the connector may be using primary read preference, which means as long as two nodes are up it will be working and if only one node is up it will not retrieve any data.
Therefore, bring down two of the three nodes and observe whether you are able to query.
Silly question, when we talk about secondaries in the context of failover behaviour, with regards to master-master/multi-master, is that basically any node other than the one that we are currently reading from or writing to?
In master-master replication both the nodes are primary and secondary. In multi master replication every node is secondary but some or all are primary.
Multi master means there many database servers over which write can perform. In order to sync with other data nodes or database server we have to read all other writes and It behaves as secondary. In master slave replication we have only one master and many slaves. Master ensures that he is only write enabled and no one can writes so no need to read any one. and it behave as primary only.
For example- mysql 5.6 replication has support master-master replication but doesn't support multi master replication. But in mysql 5.7 replication it also support multi master replication. In mongoDB It only support master - slave replication.
PyMongo has a MongoClient for connecting to single nodes, and a MongoReplicaSetClient for connecting to entire replica sets; the latter is able to route reads to secondaries and monitor set health. But what is the difference if I connect to a mongos instead of the replica set nodes? As far as I understand, mongos handles all the routing and monitoring itself.
MongoClient is for connecting to standalones and mongos, because the driver doesn't have to handle failover in those cases. The mongos handles the routing, etc. when it is connected to replica sets. When connecting to a single replica set, use the MongoReplicaSetClient so the driver will handle autodiscovery of set members, failing over to a new primary, etc. See High Availability and PyMongo.
I have two mongod instances without replication each having same collection name but different data.Now initialized replication between them.Secondary machine copies all data from primary machine and looses it's original data.Can I recover original data present in secondary machine ?
This is the expected behaviour with MongoDB replica sets: data from the primary is replicated to the secondaries. When you add a server as a new secondary, it does an "initial sync" which copies data from the primary. The replica sets are designed for failover and redundancy; your secondary nodes should have data consistent with the primary and their current replication lag.
If you have overwritten your previous database, your only option is to restore from a backup.
See also:
Backup and Restoration Strategies
Replica Set Internals Part V: Initial Sync
I am handling with Autosharding,and i had questioned about Whether the data in the shard "A" Will be available in shard "B".They have answered as Data in the shard "A" Will not be available in shard "B".In this scenario,how the automatic failover works? For example i have 3 Shards one of my shards gets failed, Then we can access data from the other shards right?. If the data is different in each shard then how can we access data?...Anyone can explain about this..Plz..
Sharding is not about failover but rather about scalability. Failover achieved with replica sets. I.e. each shard is running as replica set with multiple nodes, when the master node fails new master node is elected among slave ones.
Here how it looks: http://www.infoq.com/resource/news/2010/08/MongoDB-1.6/en/resources/mongodb2.png