Does a mongod Configserver also contain data (except metadata)? - mongodb

I am getting started with MongoDB and cannot find the answer to the question.
For test purposes I want to create a 3 Datanode Cluster, but so far I am not sure how many machines i will need to start a cluster with 3 Datanodes. I want to have 2 routingservers in the cluster.
My current understanding is that I will need 4 machines.
Machine (Configserver and Routingserver): runs mongod --configsrv and mongos
Machine (Shard and Routingserver): runs mongod and mongos
Machine (Shard): runs only the mongod
Machine (Shard): runs only the mongod
So in my opinion a mongod --configsrv cannot be a shard at the same time?

In MongoDB the config server will store any data other than metadata for a sharded cluster. If you manually connect to the config server and try to write data, you get this error:
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 14037,
"errmsg" : "can't create user databases on a --configsvr instance"
}
})
Regarding the number of servers, each shard should run on its own machine. As you only have two shards, you can get away with 2 machines, however, 4 would be desirable so you can have a primary and a secondary replica set for both shards. The config server and routing servers can be run on any of the four machines, so you only need 4 machines.

Related

MongoDB : How to perform sharding without replication?

I am trying to accomplish sharding within 2 machines with config server, router, 1 shard in machine A and another shard in machine B. I am finding it hard to do this as I am a beginner and also can't find much documentation/ tutorials online. I have started a two mongod instances one as config server and another as shard, but clueless on how to proceed.
Below is the sharding configuration in two of my mongod (config and shard ) conf files:
Config server:
sharding:
clusterRole: configsvr
Shard:
sharding:
clusterRole : shardsvr
As per the documentation , the next step is to execute the command rs.initiate(), but I don't require replication. I still tried to execute just in case and received below error:
{
"ok" : 0,
"errmsg" : "This node was not started with the replSet option",
"code" : 76,
"codeName" : "NoReplicationEnabled"
}
Is it mandatory to have replication while sharding? How to do sharding without replication within 2 machines?
That's not possible, see sharding Options:
Note
Setting sharding.clusterRole requires the mongod instance to be
running with replication. To deploy the instance as a replica set
member, use the replSetName setting and specify the name of the
replica set.
But you can have a replica set with just one member, that's no problem.
The replica set will have only the primary, should work.

MongoDB locked after aborted db.repairDatabase()? How to unlock?

I tried doing a db.repairDatabase() command from a mongo shell on a healthy but large MongoDB database. It was running for about 10 hours and it still did not complete. For better or worse, I hit Ctrl-C to cancel it.
It appears that the cluster has been left in some locked state. Commands like "show dbs" all fail with "Operation timed out":
mongos> show dbs
2016-06-10T09:38:10.179-0400 E QUERY [thread1] Error: listDatabases failed:{ "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" } :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
Mongo.prototype.getDBs#src/mongo/shell/mongo.js:62:1
shellHelper.show#src/mongo/shell/utils.js:760:19
shellHelper#src/mongo/shell/utils.js:650:15
#(shellhelp2):1:1
It has been like this for about 10 more hours now after I killed the db.repairDatabase().
What is the correct way to recover from this?
My cluster info: I am running MongoDB 3.2.5 everywhere. I have 3 config servers, 11 shards, each shard is a replica set consisting of 2 nodes plus an arbiter. And I have about 40 nodes running mongos instances. The 3 config servers are still 3.0-style (not yet upgraded to replica-set).
Well for what it's worth I was able to bring the cluster back as follows:
Restarted all mongos services.
Restarted all mongod arbiters.
Restarted mongod for all 3 config servers.
Restarted mongod for 1 node from each of my 11 shards' replica sets.
Restarted mongod for the other 1 node from each of my 11 shards' replica sets.
Steps 1 thru 4 didn't fix anything.
But after I ran step 5 I was able to once again use all the databases. Things seem to be back to normal now.

Setting up distributed MongoDB with 4 servers

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?
No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.

Where is the mongos config database string being stored?

I made a mistake in my mongo sharding setup - I had an error in my config database string. I tried to clean this up by deleting all the data in the config database servers, and restarting all the mongod services. However, even after restarting mongos I still initially get an error like this,
When I run :
sh.status():
I get :
mongos specified a different config database string : stored : <old string here>
Where is this this string actually being stored? I tried looking for it in the config databases themselves and also the members of the shard, but I can't seem to find it.
As at MongoDB 2.4, the --configsvr string specified for the mongos servers is also cached in-memory on the shard mongod servers. In order for mongos servers to join a sharded cluster, they must have an identical config string (including order of hostnames).
There is a tutorial in the MongoDB manual to Migrate Config Servers with Different Hostnames which covers the full process, including migrating a config server to a new host (which isn't applicable in your case).
If you are still seeing the "different config database string" error after restarting everything, it's likely that you had a least one rogue mongod or mongos running during the process.
To resolve this error:
shut down all the mongod processes (for the shards)
shut down the mongos processes
restart the mongod processes (for the shards)
restart the mongos with the correct --configsvr string

Setup Shards: Should I install MongoDB on the following servers

Following the Oreily Scaling MongoDB book (i.e. Page 27), I saw the following command:
Once you’re connected, you can add a shard. There are two ways to add
a shard, depending on whether the shard is a single server or a
replica set. Let’s say we have a single server, sf-02, that we’ve been
using for data. We can make it the first shard by running the addShard
command:
> db.runCommand({"addShard" : "sf-02:27017"})
{ "shardAdded" : "shard0000", "ok" : 1 }
Question 1>: What should be done on the servers of sf-02?
Should I also install MongoDB on it? If any, which package?
For example, if we had a replica set creatively named replica set “rs”
with members rs1-a, rs1-b, and rs1-c, we could say:
> db.runCommand({"addShard" : "rs/rs1-a,rs1-c"})
{ "shardAdded" : "rs", "ok" : 1 }
Question 2>: where is "rs" located?
Question 3>: Does rs1-a, rs1-c share the same machine?
reply 1: you should run mongod with the --shardsvr option to start it as a shard server. each shard server has to know that it is will receive a connection from a mongos (the shard router).
reply 2: 'rs' is the name of a replica set, a set is just a group of machine (usually 3). so it is not located on a single machine, it is an abstract entity which represent the group of machine in the set.
reply 3: no. for testing purpose you can run replica set on the same machine, but the purpose of a replica set is failover. in production you should use different machine for every member of the set.