mongodb sharded collection query failed: setShardVersion failed host - mongodb

I have encountered a problem after adding a shard to mongodb cluster.
I did the following operations:
1. deploy a mongodb cluster with primary shard named 'shard0002'(10.204.8.155:27010) for all databases.
2. for some reason I removed it and add a new shard of different host (10.204.8.100:27010, was automaticlly named shard0002 too) after migrating finished.
3. then add another shard (the one removed in step 1), named 'shard0003'
4. executing query on a sharded collection.
5. the following errors appeared:
mongos> db.count.find()
error: {
"$err" : "setShardVersion failed host: 10.204.8.155:27010 { errmsg: \"exception: gotShardName different than what i had before before [shard0002] got [shard0003] \", code: 13298, ok: 0.0 }",
"code" : 10429
}
I tried to rename the shardname, but it's not allowed.
mongos> use config
mongos> db.shards.update({_id: "shard0003"}, {$set: {_id: "shard_11"}})
Mod on _id not allowed
I have also tried to remove it, draining stared but processing seems hung up.
what can I do ?
------------------------
lastupate (24/02/2014 00:29)
I found the answer on google. since mongod has it's own cache for mongod configuration, just restart the sharded mongod process, the problem will be fixed.

Related

Mongodb created replica set string showing exception

I have got this issue while working on replica sets. Server is successfully turning on but after executing rs.initiate() and rs.status I am getting errors.
"info2" : "no configuration explicitly specified -- making one",
"errmsg" : "exception: bad --replSet config string format is: <setname>[host1>,<seedhost2>,...]",
"code" : 13093,
"ok" : 0
I ran into this problem as well. What happened was I configured the replica set in /etc/mongo.conf, went into the mongo client and executed rs.initiate(). What I forgot to do was restart mongo! A simple sudo service mongod restart fixed it.

How to know the existence of replica set in sharded environment from JAVA client

I want to set
mongoClient.setWriteConcern(WriteConcern.REPLICAS_SAFE);
only if replica set is present.
But in sharded environment when I do:
mongoClient.getReplicaSetStatus();
It returns null even though I have replica set.
To mongo client I am passing mongos IP.
Most MongoDB drivers, in particular Java driver which you are using will throw an exception if you try to set REPLICA_ACKNOWLEDGED writeConcern when it's not possible to get an acknowledgement from two or more nodes.
From the docs:
WriteConcern.REPLICA_ACKNOWLEDGED Tries to write to two separate nodes. [...] will
throw an exception if two writes are not possible.
See the following for more details:
http://docs.mongodb.org/manual/reference/write-concern/
http://docs.mongodb.org/ecosystem/drivers/java-replica-set-semantics/
In my testing with mongo shell, if you provide REPLICA_ACKNOWLEDGED (formerly called REPLICA_SAFE) concern to 'getlasterror' command, you will get an error when you are not communicating with a replica set. When talking to mongos process, the error will be:
{
"singleShard" : "localhost:30001",
"n" : 0,
"connectionId" : 3,
"wnote" : "no replication has been enabled, so w=2.0 won't work",
"err" : "norepl",
"ok" : 1
}
It is not the case that the client will hang forever without wtimeout being specified, that would only be the case if there is a replica set but two nodes are not available for writes indefinitely.
Note that using "majority" as w value for write concern will work correctly through mongos - note the difference in writeConcern responses:
mongos> db.coll.insert({}); db.runCommand({getlasterror:1,w:"majority"})
{
"singleShard" : "localhost:30001",
"n" : 0,
"connectionId" : 3,
"err" : null,
"ok" : 1
}
First verify that your replica set has a PRIMARY using the mongo shell command rs.status()
Then if that worked, verify that you are connecting to the database correctly:
MongoClient mongoClient = new MongoClient( "hostname" , 27017 );
If both of those are true then there should be no reason mongoClient.getReplicaSetStatus() should return NULL. It should be returning a ReplicaSetStatus object.

Can't add member into MongoDB replica-set

I am using the MongoDB 2.4.3, and following the wizard:
http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
But when adding the other members into replica-set, get the following error:
root#vm3:~# mongo
MongoDB shell version: 2.4.3
connecting to: test
rs1:PRIMARY> rs.add("vm1")
{
"errmsg" : "exception: set name does not match the set name host vm1:27017 expects",
"code" : 13145,
"ok" : 0
}
rs1:PRIMARY> rs.add("vm4")
{
"errmsg" : "exception: set name does not match the set name host vm4:27017 expects",
"code" : 13145,
"ok" : 0
}
vm1, vm3 and vm4 know each other because I configured their /etc/hosts files correctly.
Any idea? I don't understand what does this error message mean!
After restarting all vms, it works now.
root#vm3:~# mongo
MongoDB shell version: 2.4.3
connecting to: test
rs1:PRIMARY> rs.add("vm4")
{ "ok" : 1 }
rs1:PRIMARY> rs.add("vm1")
{ "ok" : 1 }
In my case, just restart virtual machines, every thing is fine.
If you are re-installing a MongoDB instance, the replSet may be living in your data file on the drive. I had the same problem setting up a new replica set. The problem was from changing the replica set name after bringing up instances with an older replSet name. I deleted the data files, ran my install scripts again and it worked just fine.

How to modify replica set config?

I have a mongo 2 node cluster running, with this replica set config.
config = {_id: "repl1", members:[
{_id: 0, host: 'localhost:15000'},
{_id: 1, host: '192.168.2.100:15000'}]
}
I have to move these both nodes on to new servers. I have copied everything from old to new servers, but I'm running into issues while reconfiguring the replica config due to ip change on the 2nd node.
I have tried this.
config = {_id: "repl1", members:[
{_id: 0, host: 'localhost:15000'},
{_id: 1, host: '192.168.2.200:15000'}]
}
rs.reconfig(config)
{
"startupStatus" : 1,
"errmsg" : "loading local.system.replset config (LOADINGCONFIG)",
"ok" : 0
}
It shows above message, but change is not happening.
I also tried changing replica set name but pointing to the same data dirs.
I am getting the following error:
rs.initiate()
{
"errmsg" : "local.oplog.rs is not empty on the initiating member. cannot initiate.",
"ok" : 0
}
What are the right steps to change the IP but keeping the data on the 2nd node, or do i need to recreate/resync the 2nd node?
Well , I had the same problem.
I had to delete all replication and oplog.
use local
db.dropDatabase()
restart your mongo with new set name
config = {_id: "repl1", members:[
{_id: 0, host: 'localhost:15000'},
{_id: 1, host: '192.168.2.100:15000'}]
}
rs.initiate(config)
I hope this works for you too
You can use force option when reconfiguring replica set:
rs.reconfig(config, {force: true})
Note that, as Adam already suggested in the comments, you should have at least 3 nodes: 2 full nodes and 1 arbiter (minimum supported configuration) or 3 full nodes (minimum recommended configuration) so that primary node can be elected.
I realise this is an old post, but I discovered I was getting this exact same error when trying to change the port used by secondaries in my replica set.
In my case, I needed to stop the secondary whose config I was changing, and bring it up on its new address and port BEFORE applying the changed config on the Primary.
This is in the mongo documentation, but the order in which I had to bring things up and down was something I'd mis-read on the first pass, so for clarity I've repeated that here:
Shut down the secondary member of the replica set you are moving.
Bring that secondary back up at its new address
Make the configuration change as detailed in the original post above
You can use rs.reconfig option. First retrieve the current configuration with rs.conf(). Modify the configuration document as needed, and then pass the modified document to rs.reconfig()
More info in docs.

MongoDb DropDatabase Not Working

I have a 200gb data base on a sharded four node cluster and and I would like to drop the databse and delet all the files associated to it from the node. I am connecting to my mongos and call dropDatabase on it. The system comes back with ok but if call show dbs, it will show the database again and shows that it is still occupying the 200gb. What I am doing wrong?
I think you are running into this issue:
https://jira.mongodb.org/browse/SERVER-4804
In most cases it seems like the database is in fact removed but the mongos still reports it as being there. You can verify it is gone by either trying to use the DB and getting an error or by logging into the shards directly and checking.
The bug refers to issues with dropping databases while a migration is happening. You can workaround the cause of the issue by doing something like this (sub in your own dbname):
mongos> use config
switched to db config
// 1. stop the balancer
mongos> db.settings.update({_id: "balancer"}, {$set: {stopped: true}}, true)
// 2. wait for in-progress migrations to finish, this may take a few seconds
mongos> while (db.locks.findOne({_id: "balancer", state: {$ne: 0}}) != null) { sleep(1000); }
// 3. now you can safely drop the database
mongos> use <dbname>
switched to db <dbname>
mongos> db.dropDatabase()
{ "dropped" : "<dbname>", "ok" : 1 }
You may want to run the flushRouterConfig on the mongoses to refresh the config info:
mongos> use config
switched to db config
mongos> var mongoses = db.mongos.find()
mongos> while (mongoses.hasNext()) { new Mongo(mongoses.next()._id).getDB("admin").runCommand({flushRouterConfig: 1}) }
{ "flushed" : true, "ok" : 1 }
Of course, the real fix will only come along when the fix is committed - looks like it is targeted for 2.1
If you are in a broken state, you can try this, but it is tricky:
To "reset" the sharding metadata to recover from this issue, please try to do the following
First, stop the balancer (as above) and wait for migrations to finish (also as above)
Next, ensure there is no activity from the app servers on the database in question
Now, ensure that there are no collection entries in config.collections for namespaces beginning with "TestCollection." If so, remove those entries through a mongos:
mongos> use config
mongos> db.collections.find({_id: /^TestCollection\./})
// if any records found, delete them
mongos> db.collections.remove({_id: /^TestCollection\./})
Next up, ensure there is no database entry in config.databases for "TestCollection", and if so, remove it through mongos:
mongos> use config
switched to db config
mongos> db.databases.find({_id: "TestCollection"})
// if any records found, delete them
mongos> db.databases.remove({_id: "TestCollection"})
Now, ensure there are no entries in config.chunks for any namespaces in the database (like the default test namespace). If there are any, remove through mongos:
mongos> use config
switched to db config
mongos> db.chunks.find({ns: /^test\./})
// if any records found, delete them
mongos> db.chunks.remove({ns: /^test\./})
Then, flushRouterConfig on all mongoses:
mongos> use config
switched to db config
mongos> var mongoses = db.mongos.find()
mongos> while (mongoses.hasNext()) { new Mongo(mongoses.next()._id).getDB("admin").runCommand({flushRouterConfig: 1}) }
{ "flushed" : true, "ok" : 1 }
...
Finally, manually connect to each shard primary, and drop the database on the shards (not all shards may have the database, but it's best to be thorough and issue the dropDatabase() call on all
Regarding in-progress migrations, you can use this snippet:
// 2. wait for in-progress migrations to finish, this may take a few seconds
mongos> while (db.locks.findOne({_id: "balancer", state: {$ne: 0}}) != null) { sleep(1000); }
When done, don't forget to reenable the balancer:
mongos> use config
switched to db config
mongos> db.settings.update({_id: "balancer"}, {$set: {stopped: false}}, true)
I had this exact same problem, and discovered that issuing the db.dropDatabase as a regular user failed silently but doing the same as sudo worked, in case that helps anyone.