How to Verify Sharding? - mongodb

I am trying to shard MongoDB. I am done with Sharding configuration, but I am not sure how to verify if sharding is functional.
How do i check whether my data is get sharded? Is there a query to verify/validate the shards?

You can also execute a simple command on your mongos router :
> use admin
> db.printShardingStatus();
which should output informations about your shards, your sharded dbs and your sharded collection as mentioned in the mongodb documentation
sharding version: { "_id" : 1, "version" : 2 }
shards:
{ "_id" : ObjectId("4bd9ae3e0a2e26420e556876"), "host" : "localhost:30001" }
{ "_id" : ObjectId("4bd9ae420a2e26420e556877"), "host" : "localhost:30002" }
{ "_id" : ObjectId("4bd9ae460a2e26420e556878"), "host" : "localhost:30003" }
databases:
{ "name" : "admin", "partitioned" : false,
"primary" : "localhost:20001",
"_id" : ObjectId("4bd9add2c0302e394c6844b6") }
my chunks
{ "name" : "foo", "partitioned" : true,
"primary" : "localhost:30002",
"sharded" : { "foo.foo" : { "key" : { "_id" : 1 }, "unique" : false } },
"_id" : ObjectId("4bd9ae60c0302e394c6844b7") }
my chunks
foo.foo { "_id" : { $minKey : 1 } } -->> { "_id" : { $maxKey : 1 } }
on : localhost:30002 { "t" : 1272557259000, "i" : 1 }

MongoDB has detailed documentation on Sharding here ...
http://www.mongodb.org/display/DOCS/Sharding+Introduction
To anwser you question (I think), see the portion on the config Servers ...
Each config server has a complete copy
of all chunk information. A two-phase
commit is used to ensure the
consistency of the configuration data
among the config servers.
Basically, it is the config server's job to make sure everything get sharded ... correctly.
Also, there are system collections you can query ...
db.runCommand( { listshards : 1 } );
Lots of help in the prez below too ...
http://www.slideshare.net/mongodb/mongodb-sharding-internals
http://www.10gen.com/video/mongosv2010/sharding

If you just want to check whether you are conencted to a sharded cluster or not:
db.isMaster() can be used to detect that you are connected to a sharding router (mongos).
If db.isMaster().msg is "isdbgrid", you are connected to a sharded instance.
db.isMaster() can be run without authentication.
For checking the details of the shards, sh.status() also works, which has the same output as db.printShardingStatus(); works.

Related

MongoDB Sharding - why my data isn't being distributed on all shards?

I created a MongoDB Cluster with 3 shards, each shards contain 3 mongod-processes. My Cluster also contains 3 mongos and 3 config servers
In the connection string I put the 3 mongos
mongodb://user:pass#mongos1:27017,mongos2:27017,mongos3:27017/mydatabase
In the picture you can see that dbShard_2 has 1.19GB of data while the others are almost empty with only 4KB. But on the charts you can see that there is also read/write operation on all shards. Is everything fine or did I have made some wrong configurations? shall I worry?
I left the Cloud Manager do the whole configuration for me, I didn't set these by myself manually.
Here you can check my sharding status
mongos> db.printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("XXXXX")
}
shards:
{ "_id" : "dbShard_0", "host" : "dbShard_0/dbnode-0.x-app.com:27000,dbnode-1.x-app.com:27000,dbnode-2.x-app.com:27000" }
{ "_id" : "dbShard_1", "host" : "dbShard_1/dbnode-0.x-app.com:27001,dbnode-2.x-app.com:27001,dbnode-2.x-app.com:27002" }
{ "_id" : "dbShard_2", "host" : "dbShard_2/dbnode-0.x-app.com:27002,dbnode-1.x-app.com:27001,dbnode-2.x-app.com:27003" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydatabase-staging", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "mydatabase", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "test", "partitioned" : false, "primary" : "dbShard_0" }
Cluster

Sharding in a replicaset MongoDB

I have mongoDb replica set , One primary one secondary and an arbiter to vote. I'm planning to implement sharding as the data is expected to grow exponentially.I find difficult in following mongoDb document for sharding. Could someone explain it clearly to set it up. Thanks in advance.
If you could do accomplish replicaset, sharding is pretty simple. Pretty much repeating the mongo documentation in fast forward here:
Below is for a sample setup: 3 configDB and 3 shards
For the below example you can run all of it one machine to see it all working.
If you need three shards setup three replica sets. (Assuming 3 Primary's are 127:0.0.1:27000, 127.0.0.1:37000, 127.0.0.1:47000)
Run 3 instances mongod as three config servers. (Assuming: 127.0.0.1:27020, 127.0.0.1:27021, 127.0.0.1:270122)
Start mongos (note the s in mongos) letting it know where your config servers are. (ex: 127.0.0.1:27023)
Connect to mongos from mongo shell and add the three primary mongod's of your 3 replica sets as the shards.
Enable sharding for your DB.
If required enable sharding for a collection.
Select a shard key if required. (Very Important you do it right the first time!!!)
Check the shard status
Pump data; connect to individual mongod primarys and see the data distributed across the three shards.
#start mongos with three configs:
mongos --port 27023 --configdb localhost:27017,localhost:27018,localhost:27019
mongos> sh.addShard("127.0.0.1:27000");
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("127.0.0.1:37000");
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.addShard("127.0.0.1:47000");
{ "shardAdded" : "shard0002", "ok" : 1 }
mongos> sh.enableSharding("db_to_shard");
{ "ok" : 1 }
mongos> use db_to_shard;
switched to db db_to_shard
mongos>
mongos> sh.shardCollection("db_to_shard.coll_to_shard", {collId: 1, createdDate: 1} );
{ "collectionsharded" : "db_to_shard.coll_to_shard", "ok" : 1 }
mongos> show databases;
admin (empty)
config 0.063GB
db_to_shard 0.078GB
mongos> sh.status();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("557003eb4a4e61bb2ea0555b")
}
shards:
{ "_id" : "shard0000", "host" : "127.0.0.1:27000" }
{ "_id" : "shard0001", "host" : "127.0.0.1:37000" }
{ "_id" : "shard0002", "host" : "127.0.0.1:47000" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
{ "_id" : "db_to_shard", "partitioned" : true, "primary" : "shard0000" }
db_to_shard.coll_to_shard
shard key: { "collId" : 1, "createdDate" : 1 }
chunks:
shard0000 1
{ "collId" : { "$minKey" : 1 }, "createdDate" : { "$minKey" : 1 } } -->> { "collId" : { "$maxKey" : 1 }, "createdDate" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)

MongoDB replicates data to all shards

I have 4 servers in a test environment which I use to test MongoDB replica and distribution:
RepSetA holds RepSetA1 and RepSetA2.
RepSetB holds RepSetB1 and RepSetB2.
All servers act as routers, RepSetA1 acts as a single config server.
I have a "Player" data (10,000 records, the object consists an "Id" and a "Name" fields), and I want it to be sharded (or distributed) between the replica sets, and cloned among the servers in the same replica set. So, just for a plain example:
Player1-5000: Exists in both RepSetA1 and RepSetA2, but not in RepSetB1 and RepSetB2.
Player5000-10000: Exists in both RepSetB1 and RepSetB2, but not in RepSetA1 and RepSetA2.
What I get instead is having all players in all 4 servers.
If I print the sharding status, I get the following:
mongos> db.printShardingStatus();
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "RepSetA", "host" : "RepSetA/MongoRepSetA1:27018,MongoRepSetA2:27018" }
{ "_id" : "RepSetB", "host" : "RepSetB/MongoRepSetB1:27018,MongoRepSetB2:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "GamesDB", "partitioned" : true, "primary" : "RepSetA" }
GamesDB.Player chunks:
RepSetA 2
{ "_id" : { $minKey : 1 } } -->> { "_id" : 0 } on : RepSetA { "t" : 1000, "i" : 1 }
{ "_id" : 0 } -->> { "_id" : { $maxKey : 1 } } on : RepSetA { "t" : 1000, "i" : 2 }
{ "_id" : "test", "partitioned" : false, "primary" : "RepSetB" }
{ "_id" : "EOO", "partitioned" : false, "primary" : "RepSetB" }
I used the following commands to build the shards:
db.adminCommand( { addShard : "RepSetA/MongoRepSetA1:27018,MongoRepSetA2:27018" } )
db.adminCommand( { addShard : "RepSetB/MongoRepSetB1:27018,MongoRepSetB2:27018" } )
db.runCommand( { enablesharding : "GamesDB" } );
db.runCommand( { shardcollection : "GamesDB.Player", key : { _id :1 } , unique : true} );
What am I doing wrong?
If you connect through a mongos process to your nodes, it will look like all contain the data. From your output, It doesn't look like all data is available on all nodes. RepSetA holds 2 chunks and RepSetB should contain none. You can verify this by connecting directly with your nodes rather than through mongos.
By the way, if you're using MongoDBs ObjectId as your _id(shard key), consider to shard on another key as this will cause all inserts be made into one node as the key changes monoton.
This is fine. It does not show that all data is on all servers. The output shows that all chunks (data) of GamesDB.Player are on shard RepSetA
GamesDB.Player chunks:
RepSetA 2
{ "_id" : { $minKey : 1 } } -->> { "_id" : 0 } on : RepSetA { "t" : 1000, "i" : 1 }
{ "_id" : 0 } -->> { "_id" : { $maxKey : 1 } } on : RepSetA { "t" : 1000, "i" : 2 }
This means that the balancer has not started to balance your chunks. The balancer only kicks in when there is an 8 chunk difference.
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing
You can force balancing by manually splitting chunks (if you want to)
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
Or you can reduce the chunk size if you want to see balancing sooner.
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-ChunkSizeConsiderations

Mongodb using db.help() on a particular db command

When I type db.help()
It returns
DB methods:
db.addUser(username, password[, readOnly=false])
db.auth(username, password)
...
...
db.printShardingStatus()
...
...
db.fsyncLock() flush data to disk and lock server for backups
db.fsyncUnock() unlocks server following a db.fsyncLock()
I'd like to find out how to get more detailed help for the particular command. The problem was with the printShardingStatus as it returned "too many chunks to print, use verbose if you want to print"
mongos> db.printShardingStatus()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "localhost:10001" }
{ "_id" : "shard0001", "host" : "localhost:10002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "dbTest", "partitioned" : true, "primary" : "shard0000" }
dbTest.things chunks:
shard0001 12
shard0000 19
too many chunks to print, use verbose if you want to for
ce print
I found that for that particular command I can specify boolean parameter
db.printShardingStatus(true)
which wasn't shown using db.help().
One way to find our more about a command is to call it without the parentheses to see the javascript :)
rs:PRIMARY> db.printShardingStatus
function (verbose) {
printShardingStatus(this.getSiblingDB("config"), verbose);
}
rs:PRIMARY

Mongo sharding fails to split large collection between shards

I'm having problems with what seems to be a simple sharding setup in mongo.
I have two shards, a single mongos instance, and a single config server set up like this:
Machine A - 10.0.44.16 - config server, mongos
Machine B - 10.0.44.10 - shard 1
Machine C - 10.0.44.11 - shard 2
I have a collection called 'Seeds' that has a shard key 'SeedType' which is a field that is present on every document in the collection, and contains one of four values (take a look at the sharding status below). Two of the values have significantly more entries than the other two (two of them have 784,000 records each, and two have about 5,000).
The behavior I'm expecting to see is that records in the 'Seeds' collection with InventoryPOS will end up on one shard, and the ones with InventoryOnHand will end up on the other.
However, it seems that all records for both the two larger shard keys end up on the primary shard.
Here's my sharding status text (other collections removed for clarity):
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "10.44.0.11:27019" }
{ "_id" : "shard0001", "host" : "10.44.0.10:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "TimMulti", "partitioned" : true, "primary" : "shard0001" }
TimMulti.Seeds chunks:
{ "SeedType" : { $minKey : 1 } } -->> { "SeedType" : "PBI.AnalyticsServer.KPI" } on : shard0000 { "t" : 2000, "i" : 0 }
{ "SeedType" : "PBI.AnalyticsServer.KPI" } -->> { "SeedType" : "PBI.Retail.InventoryOnHand" } on : shard0001 { "t" : 2000, "i" : 7 }
{ "SeedType" : "PBI.Retail.InventoryOnHand" } -->> { "SeedType" : "PBI.Retail.InventoryPOS" } on : shard0001 { "t" : 2000, "i" : 8 }
{ "SeedType" : "PBI.Retail.InventoryPOS" } -->> { "SeedType" : "PBI.Retail.SKU" } on : shard0001 { "t" : 2000, "i" : 9 }
{ "SeedType" : "PBI.Retail.SKU" } -->> { "SeedType" : { $maxKey : 1 } } on : shard0001 { "t" : 2000, "i" : 10 }
Am I doing anything wrong?
Semi-unrelated question:
What is the best way to atomically transfer an object from one collection to another without blocking the entire mongo service?
Thanks in advance,
-Tim
Sharding really isn't meant to be used this way. You should choose a shard key with some variation (or make a compound shard key) so that MongoDB can make reasonable-size chunks. One of the points of sharding is that your application doesn't have to know where your data is.
If you want to manually shard, you should do that: start unlinked MongoDB servers and route things yourself from the client side.
Finally, if you're really dedicated to this setup, you could migrate the chunk yourself (there's a moveChunk command).
The balancer moves chunks based on how much is mapped in memory (run serverStatus and look at the "mapped" field). It can take a while, MongoDB doesn't want your data flying all over the place in production, so it's pretty conservative.
Semi-unrelated answer: you can't do it atomically with sharding (eval isn't atomic across multiple servers). You'll have to do a findOne, insert, remove.