Easiest way to setup MongoDB sharded cluster - mongodb

What is the easiest way to setup a sharded cluster with MongoDB?

Start the mongo shell without connecting to a database:
mongo --nodb
Then start a new sharding test specifying the number of mongos and shards that you would like:
new ShardingTest({ shards: 2, mongos: 1 })
The logging output from each of the processes will be piped to this single shell (so there will be a lot). This makes that shell somewhat unusable but at least it only takes up one window and spares you from having to fork and specify a log path.
Just open up another shell and you are ready to go. Connect to the first mongos on port 30999.

Related

Sharding & Replication in mongodb

First of all, I'm a beginner in mongoDB so please be patient with me. I'm using windows and I created a simple database that contains 4 collections. When I'm dealing with mongoDB, I first run: mongod.exe --dbpath "Path To Data Folder" in a terminal and then I connect to the mongod using mongo.exe. What I'm supposed to do is to distribute the database along with its collections into shards and replica sets for supporting distributed queries.
I tried to use commands like sh.enableSharding("DATABASE NAME") but it didn't work. I then figured out that I need to run mongos instead of mongod, so I followed this: Sharding in MongoDB but unfortunately I didn't succeeded. I also did some research but it seems there is a lack of to-the-point guides on sharding and replication. So if you point me to the right direction, I would really appreciate it.
You can't enable sharding on a single database instance. You need to have at least 3 config server instances, two database (mongod) instances and a router instance (mongos). All of them should be running in the same time (i.e don't close the terminals in which you started all your instances).
A good starting point for you is to read the sharding tutorial in Deploy a Sharded Cluster - MongoDB.org

Setting up distributed MongoDB with 4 servers

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?
No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.

Where is the mongos config database string being stored?

I made a mistake in my mongo sharding setup - I had an error in my config database string. I tried to clean this up by deleting all the data in the config database servers, and restarting all the mongod services. However, even after restarting mongos I still initially get an error like this,
When I run :
sh.status():
I get :
mongos specified a different config database string : stored : <old string here>
Where is this this string actually being stored? I tried looking for it in the config databases themselves and also the members of the shard, but I can't seem to find it.
As at MongoDB 2.4, the --configsvr string specified for the mongos servers is also cached in-memory on the shard mongod servers. In order for mongos servers to join a sharded cluster, they must have an identical config string (including order of hostnames).
There is a tutorial in the MongoDB manual to Migrate Config Servers with Different Hostnames which covers the full process, including migrating a config server to a new host (which isn't applicable in your case).
If you are still seeing the "different config database string" error after restarting everything, it's likely that you had a least one rogue mongod or mongos running during the process.
To resolve this error:
shut down all the mongod processes (for the shards)
shut down the mongos processes
restart the mongod processes (for the shards)
restart the mongos with the correct --configsvr string

How writes will be distributed across different shards when I use only mongoimport for write activity...?

we are going to use mongo db for an alert monitoring application.
We thought first to write the data to files and then to write it onto mongodb using mongoimport utility. Each file will have 1Mill records on an average.
Here my question is "shall we sharding here...?"
I guess mongoimport is not aware of sharding.
How does sharding works when writes are happening by mongoimport...?
If your collection exists and is sharded and you run mongoimport against a mongos router, then it will respect sharding rules (writes will be distributed according to chunk location).
Footnote
If you have a mongodb cluster, you have to have mongos daemon(s) in there. mongos reads your cluster configuration from config servers and knows where to route requests from your app. In a cluster configuration you should never talk to mongod servers directly, only via mongos. Read more about cluster configuration here.

Remove inaccessible Mongo shard

I have a MongoDB sharded setup with 3 shards: shard0000, shard0001 and shard0002. The machine that runs shard0002 is down now, which causes all my queries to fail. I'd like to temporarily remove shard0002 from my setup and keep working with the first two shards. That should be doable assuming I only use unsharded collections that reside in the first two shards, right?
What I tried first is: db.runCommand({removeshard: 'IP:PORT'}) which obviously doesn't help, because it just puts the shard in draining mode, which will never end (since it's down). Then I tried connected to my config server and did db.shards.remove({_id: 'shard0002'}) on the config DB then restart mongos so it reloads the config. Now whenever I try to do anything I get "can't find shard for: shard0002".
Is there any way to just let Mongo know that I don't care about that shard for now, then re-enable it later when it becomes available.
I had a different problem, and I manually removed the shard with:
use config
db.shards.remove({"_id":"shard0002"});
Manually modify the the shard entry in the config db, then removeshard
I tried several options to do this in version 4.2.
At the end I ended to these commands to be executed on Config Server:
use config
db.databases.updateMany( {primary: "shard0002"}, {$set: {primary: "shard0000"} })
db.shards.deleteOne({_id : "shard0002" })
db.chunks.updateMany( {shard : "shard0002"}, {$set: {shard: "shard0000"} })
while ( db.chunks.updateMany( {"history.shard" : "shard0002"},
{$set: {"history.$.shard": "shard0000"} }).modifiedCount > 0 ) { print("Updated") }
It works to a certain extent, i.e. CRUD operations are working. However, when you run getShardDistribution() then you get an error Collection 'db.collection' is not sharded.
Finally I see only one reliable & secure solution:
Shut down all mongod and mongos in your sharded cluster
Start available shards as standalone service (see Perform Maintenance on Replica Set Members)
Take a backup from available shards with mongodump.
Drop data folders from all hosts.
Build your application newly from scratch. Startup all mongod and mongos
Load data into new cluster with mongorestore
Perhaps for large cluster you have to shuffle around a bit like this:
Deploy Config servers and mongos server, with one empty shard
Start one old shard as standalone
Take backup from this old Shard
Tear down this old shard
build a fresh empty new shard
add new shard to your new cluster
restore data into the new cluster
backup can be dropped and shard can be reused in new cluster
Repeat above for each shard you have in your cluster (the broken shard might be skipped most likely)