Setting up distributed MongoDB with 4 servers - mongodb

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?

No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.

Related

Does a mongod Configserver also contain data (except metadata)?

I am getting started with MongoDB and cannot find the answer to the question.
For test purposes I want to create a 3 Datanode Cluster, but so far I am not sure how many machines i will need to start a cluster with 3 Datanodes. I want to have 2 routingservers in the cluster.
My current understanding is that I will need 4 machines.
Machine (Configserver and Routingserver): runs mongod --configsrv and mongos
Machine (Shard and Routingserver): runs mongod and mongos
Machine (Shard): runs only the mongod
Machine (Shard): runs only the mongod
So in my opinion a mongod --configsrv cannot be a shard at the same time?
In MongoDB the config server will store any data other than metadata for a sharded cluster. If you manually connect to the config server and try to write data, you get this error:
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 14037,
"errmsg" : "can't create user databases on a --configsvr instance"
}
})
Regarding the number of servers, each shard should run on its own machine. As you only have two shards, you can get away with 2 machines, however, 4 would be desirable so you can have a primary and a secondary replica set for both shards. The config server and routing servers can be run on any of the four machines, so you only need 4 machines.

Sharding & Replication in mongodb

First of all, I'm a beginner in mongoDB so please be patient with me. I'm using windows and I created a simple database that contains 4 collections. When I'm dealing with mongoDB, I first run: mongod.exe --dbpath "Path To Data Folder" in a terminal and then I connect to the mongod using mongo.exe. What I'm supposed to do is to distribute the database along with its collections into shards and replica sets for supporting distributed queries.
I tried to use commands like sh.enableSharding("DATABASE NAME") but it didn't work. I then figured out that I need to run mongos instead of mongod, so I followed this: Sharding in MongoDB but unfortunately I didn't succeeded. I also did some research but it seems there is a lack of to-the-point guides on sharding and replication. So if you point me to the right direction, I would really appreciate it.
You can't enable sharding on a single database instance. You need to have at least 3 config server instances, two database (mongod) instances and a router instance (mongos). All of them should be running in the same time (i.e don't close the terminals in which you started all your instances).
A good starting point for you is to read the sharding tutorial in Deploy a Sharded Cluster - MongoDB.org

Where is the mongos config database string being stored?

I made a mistake in my mongo sharding setup - I had an error in my config database string. I tried to clean this up by deleting all the data in the config database servers, and restarting all the mongod services. However, even after restarting mongos I still initially get an error like this,
When I run :
sh.status():
I get :
mongos specified a different config database string : stored : <old string here>
Where is this this string actually being stored? I tried looking for it in the config databases themselves and also the members of the shard, but I can't seem to find it.
As at MongoDB 2.4, the --configsvr string specified for the mongos servers is also cached in-memory on the shard mongod servers. In order for mongos servers to join a sharded cluster, they must have an identical config string (including order of hostnames).
There is a tutorial in the MongoDB manual to Migrate Config Servers with Different Hostnames which covers the full process, including migrating a config server to a new host (which isn't applicable in your case).
If you are still seeing the "different config database string" error after restarting everything, it's likely that you had a least one rogue mongod or mongos running during the process.
To resolve this error:
shut down all the mongod processes (for the shards)
shut down the mongos processes
restart the mongod processes (for the shards)
restart the mongos with the correct --configsvr string

How writes will be distributed across different shards when I use only mongoimport for write activity...?

we are going to use mongo db for an alert monitoring application.
We thought first to write the data to files and then to write it onto mongodb using mongoimport utility. Each file will have 1Mill records on an average.
Here my question is "shall we sharding here...?"
I guess mongoimport is not aware of sharding.
How does sharding works when writes are happening by mongoimport...?
If your collection exists and is sharded and you run mongoimport against a mongos router, then it will respect sharding rules (writes will be distributed according to chunk location).
Footnote
If you have a mongodb cluster, you have to have mongos daemon(s) in there. mongos reads your cluster configuration from config servers and knows where to route requests from your app. In a cluster configuration you should never talk to mongod servers directly, only via mongos. Read more about cluster configuration here.

Remove inaccessible Mongo shard

I have a MongoDB sharded setup with 3 shards: shard0000, shard0001 and shard0002. The machine that runs shard0002 is down now, which causes all my queries to fail. I'd like to temporarily remove shard0002 from my setup and keep working with the first two shards. That should be doable assuming I only use unsharded collections that reside in the first two shards, right?
What I tried first is: db.runCommand({removeshard: 'IP:PORT'}) which obviously doesn't help, because it just puts the shard in draining mode, which will never end (since it's down). Then I tried connected to my config server and did db.shards.remove({_id: 'shard0002'}) on the config DB then restart mongos so it reloads the config. Now whenever I try to do anything I get "can't find shard for: shard0002".
Is there any way to just let Mongo know that I don't care about that shard for now, then re-enable it later when it becomes available.
I had a different problem, and I manually removed the shard with:
use config
db.shards.remove({"_id":"shard0002"});
Manually modify the the shard entry in the config db, then removeshard
I tried several options to do this in version 4.2.
At the end I ended to these commands to be executed on Config Server:
use config
db.databases.updateMany( {primary: "shard0002"}, {$set: {primary: "shard0000"} })
db.shards.deleteOne({_id : "shard0002" })
db.chunks.updateMany( {shard : "shard0002"}, {$set: {shard: "shard0000"} })
while ( db.chunks.updateMany( {"history.shard" : "shard0002"},
{$set: {"history.$.shard": "shard0000"} }).modifiedCount > 0 ) { print("Updated") }
It works to a certain extent, i.e. CRUD operations are working. However, when you run getShardDistribution() then you get an error Collection 'db.collection' is not sharded.
Finally I see only one reliable & secure solution:
Shut down all mongod and mongos in your sharded cluster
Start available shards as standalone service (see Perform Maintenance on Replica Set Members)
Take a backup from available shards with mongodump.
Drop data folders from all hosts.
Build your application newly from scratch. Startup all mongod and mongos
Load data into new cluster with mongorestore
Perhaps for large cluster you have to shuffle around a bit like this:
Deploy Config servers and mongos server, with one empty shard
Start one old shard as standalone
Take backup from this old Shard
Tear down this old shard
build a fresh empty new shard
add new shard to your new cluster
restore data into the new cluster
backup can be dropped and shard can be reused in new cluster
Repeat above for each shard you have in your cluster (the broken shard might be skipped most likely)