Setup Shards: Should I install MongoDB on the following servers - mongodb

Following the Oreily Scaling MongoDB book (i.e. Page 27), I saw the following command:
Once you’re connected, you can add a shard. There are two ways to add
a shard, depending on whether the shard is a single server or a
replica set. Let’s say we have a single server, sf-02, that we’ve been
using for data. We can make it the first shard by running the addShard
command:
> db.runCommand({"addShard" : "sf-02:27017"})
{ "shardAdded" : "shard0000", "ok" : 1 }
Question 1>: What should be done on the servers of sf-02?
Should I also install MongoDB on it? If any, which package?
For example, if we had a replica set creatively named replica set “rs”
with members rs1-a, rs1-b, and rs1-c, we could say:
> db.runCommand({"addShard" : "rs/rs1-a,rs1-c"})
{ "shardAdded" : "rs", "ok" : 1 }
Question 2>: where is "rs" located?
Question 3>: Does rs1-a, rs1-c share the same machine?

reply 1: you should run mongod with the --shardsvr option to start it as a shard server. each shard server has to know that it is will receive a connection from a mongos (the shard router).
reply 2: 'rs' is the name of a replica set, a set is just a group of machine (usually 3). so it is not located on a single machine, it is an abstract entity which represent the group of machine in the set.
reply 3: no. for testing purpose you can run replica set on the same machine, but the purpose of a replica set is failover. in production you should use different machine for every member of the set.

Related

Mongodb : error - New and old configurations differ in replica set ID

Scenario
I am working on restoring backup taken from a different replica set i.e a unique replica set to another replica set.lets call them replica set A and replica set B..
The backup is in aws EBS snapshot .
Backup available is for set A which has to be restored for set B.
I had initially copied initial configuration cfg=rs.config() of node of set B.
Now after mounting the ebs volume of set A to a node of setB created from snapshot, I am able to connect to the db.The configuration will be of set A as the volume was created from set A backup which means all hostname are of set A in existing configuration after restoration.
Issue :
While trying to force the existing configuration,now I am running into below issue .
rs.reconfig(cfg,{force:true})
{
"ok" : 0,
"errmsg" : "New and old configurations differ in replica set ID; old was 5c4a6ab3b5306ee3ec95dae4, and new is 59dc23bfa547d208144dd564",
"code" : 103,
"codeName" : "NewReplicaSetConfigurationIncompatible",
"operationTime" : Timestamp(1616525693, 4976),
"$clusterTime" : {
"clusterTime" : Timestamp(1616573470, 22),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
Question
What is the significance of replicaset ID ?
If I make the replicaset ID same in the configuration I am trying to force -- then what will be it's side effect if any.
How does replset configs get synced across nodes ( I am not looking for any command but underneath details )
Let me know if more details are needed to add clarity to the question.
Note : the hosts are different in the set A and set B and both follows replication model with arbiter node.
MongoDB makes some sanity checks when dealing with replica sets:
the replica set name is specified in the configuration file or startup command line
that name is used as the replica set ID when creating the replica set
the replica set configuration document is copied to each other node when they are added
each node stores a copy of the replica set configuration document in its local database
When starting up, and when a new replica set configuration document is received, mongod will check that the replica set ID matches what is already has, and that its host name appears in the members list of the new configuration. If anything doesn't match, it transitions to a state that will not accept writes.
This helps to ensure the consistency of the data across replica set members.
Basic steps to restore a replica set from backups. Taken from the docs, for more detail, see
https://docs.mongodb.com/manual/tutorial/restore-replica-set-from-backup/index.html
Obtain backup MongoDB Database files.
Drop the local database if it exists in the backup.
Start a new single-node replica set.
Connect a mongo shell to the mongod instance.
Initiate the new replica set.

MongoDB : How to perform sharding without replication?

I am trying to accomplish sharding within 2 machines with config server, router, 1 shard in machine A and another shard in machine B. I am finding it hard to do this as I am a beginner and also can't find much documentation/ tutorials online. I have started a two mongod instances one as config server and another as shard, but clueless on how to proceed.
Below is the sharding configuration in two of my mongod (config and shard ) conf files:
Config server:
sharding:
clusterRole: configsvr
Shard:
sharding:
clusterRole : shardsvr
As per the documentation , the next step is to execute the command rs.initiate(), but I don't require replication. I still tried to execute just in case and received below error:
{
"ok" : 0,
"errmsg" : "This node was not started with the replSet option",
"code" : 76,
"codeName" : "NoReplicationEnabled"
}
Is it mandatory to have replication while sharding? How to do sharding without replication within 2 machines?
That's not possible, see sharding Options:
Note
Setting sharding.clusterRole requires the mongod instance to be
running with replication. To deploy the instance as a replica set
member, use the replSetName setting and specify the name of the
replica set.
But you can have a replica set with just one member, that's no problem.
The replica set will have only the primary, should work.

Does a mongod Configserver also contain data (except metadata)?

I am getting started with MongoDB and cannot find the answer to the question.
For test purposes I want to create a 3 Datanode Cluster, but so far I am not sure how many machines i will need to start a cluster with 3 Datanodes. I want to have 2 routingservers in the cluster.
My current understanding is that I will need 4 machines.
Machine (Configserver and Routingserver): runs mongod --configsrv and mongos
Machine (Shard and Routingserver): runs mongod and mongos
Machine (Shard): runs only the mongod
Machine (Shard): runs only the mongod
So in my opinion a mongod --configsrv cannot be a shard at the same time?
In MongoDB the config server will store any data other than metadata for a sharded cluster. If you manually connect to the config server and try to write data, you get this error:
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 14037,
"errmsg" : "can't create user databases on a --configsvr instance"
}
})
Regarding the number of servers, each shard should run on its own machine. As you only have two shards, you can get away with 2 machines, however, 4 would be desirable so you can have a primary and a secondary replica set for both shards. The config server and routing servers can be run on any of the four machines, so you only need 4 machines.

MongoDB sharding: host does not belong to replica set

I am a Mongo newbie.
I am trying to sping up a MongoDB cluster with both sharding and replication. Cluster schema which I want to implement is: https://github.com/ansible/ansible-examples/raw/master/mongodb/images/site.png
I am using server IP as replication set name. I.e. I am building replication sets with commands below:
rs.initiate()
rs.add("10.148.28.51:27118")
rs.add("10.148.28.52:27118")
rs.add("10.148.28.53:27118")
Replication is being configured correctly so when I am executing rs.status() on PRIMARY host 10.148.28.51 I am getting "10.148.28.51" as repl.set name: https://gist.github.com/daniilyar/630bc6fe7723ed06f243
But when I am trying to add shards at mongos instance it gives me 2 opposite errors (depending on what addShard() syntax variation I use):
mongos> sh.addShard("10.148.28.51:27118")
{
"ok" : 0,
"errmsg" : "host is part of set 10.148.28.51, use replica set url format <setname>/<server1>,<server2>,...."
}
mongos> sh.addShard("10.148.28.51/10.148.28.51:27118")
{
"ok" : 0,
"errmsg" : "in seed list 10.148.28.51/10.148.28.51:27118, host 10.148.28.51:27118 does not belong to replica set 10.148.28.51"
}
How do I add shard if Mongo tells that "host X is in replica set Y" and that "host X does not belong to replica set Y" in the same time?
Any help would be greatly appreciated
From your description sounds like you need to tweak the way your are using the rs.add(..) command. You state you are using the IP address as the name of the replica set but this is not how rs.add(...) interprets the argument.
The argument you pass is the hostname (or IP) and port of the mongod instance you are looking to add to the replica set notthe replica set name. You set-up this configuration when connected via mongo to the primary. The replSet name is set when the primary is started:
mongod --replSet "rs1"
sets the as name of rs1.
I'd have a read over: http://docs.mongodb.org/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/ as it covers pretty much what you appear to be trying to do.
I'd also consider what you are trying to achieve as it sounds (from your description) like you may end up with a single replicated shard (!!!) when you most probably are looking to create multiple shards each of which have their data replicated.
References:
rs.add command - http://docs.mongodb.org/manual/reference/method/rs.add/
rs.addShard command - http://docs.mongodb.org/manual/reference/method/sh.addShard/
Sharded Cluster -
http://docs.mongodb.org/manual/core/sharded-cluster-components/
Thank you for good explanation, now I understand. If you use in rs.add(IP:port), Mongo adds replica set member with name ip-X-Y-Z-R:. It seems to be Mongo's default behavior. So in my case solution was to use command:
sh.addShard("10.148.28.51/**ip-10-148-28-51**:27118")
instead of:
sh.addShard("10.148.28.51/**10.148.28.51**:27118")

Setting up distributed MongoDB with 4 servers

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?
No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.