Shard Existing Collection - MongoDB - mongodb

I have a mongo collection on farm1-server1 and I managed to replicate it to farm2-server1 - the db path is /db/data. farm2-server1 is a part of 3 servers and I want to shard the collection I just replicated between all 3 servers. In order to do that, I stopped replication on server1, started mongod (on port 27017) and pointed it to the collection I replicated (/db/data) - I also added the directive:
configsvr = true.
I started mongos and added the following directive
configdb = server1:27017
Then I started the shard processes (mongod) on each one of the server 1-3 with the directive:
shardsvr = true
I expected the collection to be sharded, but what happens is that the old collection I replicated is not recognized in this configuration, hence it cannot be sharded.
I have read that existing collections can be sharded so I must be doing something wrong here. Any help is appreciated. I can provided configuration files is required.
Thanks, Noam

It sounds like you have missed some steps - you need to
add shards
enable sharding
shard the collection
http://www.mongodb.org/display/DOCS/Configuring+Sharding#ConfiguringSharding-ConfiguringtheShardCluster

Related

Import documents to a particular/targeted shard in Mongo Shard Cluster

I'm new to MongoDB, however I have setup a shard cluster with 3 replica sets. Each replica set has 2 mongod instances. I have a separate config server and 1 mongos instance.
I wanted to try whether it's possible to import documents to a particular/targeted shard, when the collection is not shard enabled. (As far as I know, we can't control in which shard the collection will get saved when importing is done via mongos instance)
Hence, I imported the documents using below command, and it was successful (followed this);
mongoimport --db NormalTestDB --collection TestDams --host <replSetName>/<hostname1><:port>,<hostname2><:port> --jsonArray --file “<path_to_json>”
(I used a particular replica set name in <replSetName>)
Now when I try this simple query db.TestDams.find().count() in mongos shell (by connecting to NormalTestDB), it returns 0. But if I try the same query by directly connecting to the Primary of the relevant replicaSet, I get 14603.
Is there any way that I could sync? or is there a way to use mongoimport targeting a particular shard??
(Aside: You should generally have 3 nodes in a replica set rather than 2.)
When you enable sharding on a database you can specify the primary shard. Unsharded collections will be stored on that shard. So, you can define one database per shard with each configured to have a distinct shard as the primary shard.
Alternatively if you set up zones you can specify which values of shard keys are to be stored on each shard. Then by inserting appropriate values you'd have the data stored on the shards you want.
You should be performing all operations on your sharded cluster through one of the mongos nodes, not like you have done writing directly to shard mongods.

how to get results from different shards and same collection

I created a test MongoDB Sharding with 1 Config Server, 1 Router and 2 Shards.
I created a config server with
mongod --configsvr --dbpath /data/configdb --port 27019
And I run mongos with mongos --configdb <CONFIG SERVER IP>:27019. I connected shards and database and collections. When I query sharded collection, cursor doesn't get results from two shards, it gets results only 1 shard. According to MongoDB Sharding Document, I can distribute a collection to different shards but results didn't satisfy this information.
What am I doing wrong? What's the correct way to get results from different shards with same query?
The correct way is to ensure that you do NOT have to go cross shard, often. You chose a shard key so that your queries for not end up going across shards. e.g. a user database could have user id/name as the shard key. Most queries go looking for a single user, so would come back from single shard.
For queries that cannot be satisfied with results from 1 shard alone, Mongo automatically requests data from all shards.

Sharding & Replication in mongodb

First of all, I'm a beginner in mongoDB so please be patient with me. I'm using windows and I created a simple database that contains 4 collections. When I'm dealing with mongoDB, I first run: mongod.exe --dbpath "Path To Data Folder" in a terminal and then I connect to the mongod using mongo.exe. What I'm supposed to do is to distribute the database along with its collections into shards and replica sets for supporting distributed queries.
I tried to use commands like sh.enableSharding("DATABASE NAME") but it didn't work. I then figured out that I need to run mongos instead of mongod, so I followed this: Sharding in MongoDB but unfortunately I didn't succeeded. I also did some research but it seems there is a lack of to-the-point guides on sharding and replication. So if you point me to the right direction, I would really appreciate it.
You can't enable sharding on a single database instance. You need to have at least 3 config server instances, two database (mongod) instances and a router instance (mongos). All of them should be running in the same time (i.e don't close the terminals in which you started all your instances).
A good starting point for you is to read the sharding tutorial in Deploy a Sharded Cluster - MongoDB.org

Setting up distributed MongoDB with 4 servers

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?
No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.

Remove inaccessible Mongo shard

I have a MongoDB sharded setup with 3 shards: shard0000, shard0001 and shard0002. The machine that runs shard0002 is down now, which causes all my queries to fail. I'd like to temporarily remove shard0002 from my setup and keep working with the first two shards. That should be doable assuming I only use unsharded collections that reside in the first two shards, right?
What I tried first is: db.runCommand({removeshard: 'IP:PORT'}) which obviously doesn't help, because it just puts the shard in draining mode, which will never end (since it's down). Then I tried connected to my config server and did db.shards.remove({_id: 'shard0002'}) on the config DB then restart mongos so it reloads the config. Now whenever I try to do anything I get "can't find shard for: shard0002".
Is there any way to just let Mongo know that I don't care about that shard for now, then re-enable it later when it becomes available.
I had a different problem, and I manually removed the shard with:
use config
db.shards.remove({"_id":"shard0002"});
Manually modify the the shard entry in the config db, then removeshard
I tried several options to do this in version 4.2.
At the end I ended to these commands to be executed on Config Server:
use config
db.databases.updateMany( {primary: "shard0002"}, {$set: {primary: "shard0000"} })
db.shards.deleteOne({_id : "shard0002" })
db.chunks.updateMany( {shard : "shard0002"}, {$set: {shard: "shard0000"} })
while ( db.chunks.updateMany( {"history.shard" : "shard0002"},
{$set: {"history.$.shard": "shard0000"} }).modifiedCount > 0 ) { print("Updated") }
It works to a certain extent, i.e. CRUD operations are working. However, when you run getShardDistribution() then you get an error Collection 'db.collection' is not sharded.
Finally I see only one reliable & secure solution:
Shut down all mongod and mongos in your sharded cluster
Start available shards as standalone service (see Perform Maintenance on Replica Set Members)
Take a backup from available shards with mongodump.
Drop data folders from all hosts.
Build your application newly from scratch. Startup all mongod and mongos
Load data into new cluster with mongorestore
Perhaps for large cluster you have to shuffle around a bit like this:
Deploy Config servers and mongos server, with one empty shard
Start one old shard as standalone
Take backup from this old Shard
Tear down this old shard
build a fresh empty new shard
add new shard to your new cluster
restore data into the new cluster
backup can be dropped and shard can be reused in new cluster
Repeat above for each shard you have in your cluster (the broken shard might be skipped most likely)