Check configuration of mongodb setup - mongodb

I have setup mongodb for cluster environment. My config server and router is running on one machine, whereas sharding is running on three different machine. I want to know if any command available which I can run on terminal(On which configsvr and router is running) and it will display all routername,configserver(associated with it),Other sharded databases(associated with it).
To simplify it more.
Suppose I run mycommand/piece of code, it display.
router1---> configserver1----> Shardeddb1
----> Shardeddb2
-----> shardeddb3
Modifying to make it more clear.
My router1 and configserver1 are running on single machine(say ip 19.0.0.123), Shardeddb1(say ip 19.0.0.124),Shardeddb2(say ip 19.0.0.125),Shardeddb3(say ip 19.0.0.126).
I want to make Shardeddb1 as a primary and (Shardeddb2,Shardeddb3) as secondary. If I run sh.status(); it show me details but not about which database belong to which machine. So is there any script which can show me more details?
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("545b632e9be3f019d6ef788f")
}
shards:
{ "_id" : "ps1", "host" : "ps1/19.0.0.123:27017","draining" : true }
{ "_id" : "ps2", "host" : "ps2/19.0.0.124:27017" }
{ "_id" : "shard0000", "host" : "19.0.0.125:27017" }
{ "_id" : "shard0001", "host" : "19.0.0.126:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
{ "_id" : "demo", "partitioned" : true, "primary" : "shard0000" }
{ "_id" : "db", "partitioned" : false, "primary" : "ps1" }
{ "_id" : "mongotestDB", "partitioned" : true, "primary" : "ps1" }
mongotestDB.logcoll
shard key: { "id" : 1 }
chunks:
shard0000 4
shard0001 9
ps2 7
ps1 5
too many chunks to print, use verbose if you want to force print

Since your diagram shows otherwise:
You can either have exactly 1 or 3 config servers.
You should always have your mongos instances have the exact same string for the configdb parameter. And this string has to hold all config servers in the same order. Otherwise, you risk metadata corruption.
All configservers and mongos need to be able to connect to and be connected by all nodes in the cluster.
The easiest way to have an overview on your cluster is the free MongoDB Management Service monitoring.
Running mongos' and configservers on the same machine is possible - as long as you keep an eye on the load. If things get rough, and the configservers have delays in updating the metadata because the mongos consume all the IO, you might worsen things. If chunk splits are delayed (and they are more likely under high load), this might cause JumboChunks, which have to be split manually and - until this is done - can't be migrated. Therefor, it is a Very Bad Idea™ to have mongos instances run on the configservers, imvho. A far better solution is to have the mongos instances run on the application servers, one on each.

Related

Is there a primary shard in DBs in which sh.enableSharding() has not been yet executed?

MongoDB sharding cluster uses a "primary shard" to hold collection data in DBs in which sharding has been enabled (with sh.enableSharding()) but the collection itself has not been yet enabled (with sh.shardCollection()). The mongos process choses automatically the primary shard, except if the user state it explicitly as parameter of sh.enableSharding()
However, what happens in DBs where sh.enableSharding() has not been executed yet? Is there some "global primary" for these cases? How can I know which one it is? sh.status() doesn't show information about it...
I'm using MongoDB 4.2 version.
Thanks!
The documentation says:
The mongos selects the primary shard when creating a new database by picking the shard in the cluster that has the least amount of data.
If enableSharding is called on a database which already exists, the above quote would define the location of the database prior to sharding being enabled on it.
sh.status() shows where the database is stored:
MongoDB Enterprise mongos> use foo
switched to db foo
MongoDB Enterprise mongos> db.foo.insert({a:1})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5eade78756d7ba8d40fc4317")
}
shards:
{ "_id" : "shard01", "host" : "shard01/localhost:14442,localhost:14443", "state" : 1 }
{ "_id" : "shard02", "host" : "shard02/localhost:14444,localhost:14445", "state" : 1 }
active mongoses:
"4.3.6" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
{ "_id" : "foo", "primary" : "shard02", "partitioned" : false, "version" : { "uuid" : UUID("ff618243-f4b9-4607-8f79-3075d14d737d"), "lastMod" : 1 } }
{ "_id" : "test", "primary" : "shard01", "partitioned" : false, "version" : { "uuid" : UUID("4d76cf84-4697-4e8c-82f8-a0cfad87be80"), "lastMod" : 1 } }
foo is not partitioned and stored in shard02.
If enableSharding is called on a database which doesn't yet exist, the database is created and, the primary shard is specified, the specified shard is used as the primary shard. Test code here.

MongoDB Sharding - why my data isn't being distributed on all shards?

I created a MongoDB Cluster with 3 shards, each shards contain 3 mongod-processes. My Cluster also contains 3 mongos and 3 config servers
In the connection string I put the 3 mongos
mongodb://user:pass#mongos1:27017,mongos2:27017,mongos3:27017/mydatabase
In the picture you can see that dbShard_2 has 1.19GB of data while the others are almost empty with only 4KB. But on the charts you can see that there is also read/write operation on all shards. Is everything fine or did I have made some wrong configurations? shall I worry?
I left the Cloud Manager do the whole configuration for me, I didn't set these by myself manually.
Here you can check my sharding status
mongos> db.printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("XXXXX")
}
shards:
{ "_id" : "dbShard_0", "host" : "dbShard_0/dbnode-0.x-app.com:27000,dbnode-1.x-app.com:27000,dbnode-2.x-app.com:27000" }
{ "_id" : "dbShard_1", "host" : "dbShard_1/dbnode-0.x-app.com:27001,dbnode-2.x-app.com:27001,dbnode-2.x-app.com:27002" }
{ "_id" : "dbShard_2", "host" : "dbShard_2/dbnode-0.x-app.com:27002,dbnode-1.x-app.com:27001,dbnode-2.x-app.com:27003" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydatabase-staging", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "mydatabase", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "test", "partitioned" : false, "primary" : "dbShard_0" }
Cluster

Mongos routing with ReadPreference=NEAREST

I'm having trouble diagnosing an issue where my Java application's requests to MongoDB are not getting routed to the Nearest replica, and I hope someone can help. Let me start by explaining my configuration.
The Configuration:
I am running a MongoDB instance in production that is a Sharded ReplicaSet. It is currently only a single shard (it hasn't gotten big enough yet to require a split). This single shard is backed by a 3-node replica set. 2 nodes of the replica set live in our primary data center. The 3rd node lives in our secondary datacenter, and is prohibited from becoming the Master node.
We run our production application simultaneously in both data centers, however the instance in our secondary data center operates in "read-only" mode and never writes data into MongoDB. It only serves client requests for reads of existing data. The objective of this configuration is to ensure that if our primary datacenter goes down, we can still serve client read traffic.
We don't want to waste all of this hardware in our secondary datacenter, so even in happy times we actively load balance a portion of our read-only traffic to the instance of our application running in the secondary datacenter. This application instance is configured with readPreference=NEAREST and is pointed at a mongos instance running on localhost (version 2.6.7). The mongos instance is obviously configured to point at our 3-node replica set.
From a mongos:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("52a8932af72e9bf3caad17b5")
}
shards:
{ "_id" : "shard1", "host" : "shard1/failover1.com:27028,primary1.com:27028,primary2.com:27028" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard1" }
{ "_id" : "MyApplicationData", "partitioned" : false, "primary" : "shard1" }
From the failover node of the replicaset:
shard1:SECONDARY> rs.status()
{
"set" : "shard1",
"date" : ISODate("2015-09-03T13:26:18Z"),
"myState" : 2,
"syncingTo" : "primary1.com:27028",
"members" : [
{
"_id" : 3,
"name" : "primary1.com:27028",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 674841,
"optime" : Timestamp(1441286776, 2),
"optimeDate" : ISODate("2015-09-03T13:26:16Z"),
"lastHeartbeat" : ISODate("2015-09-03T13:26:16Z"),
"lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
"pingMs" : 49,
"electionTime" : Timestamp(1433952764, 1),
"electionDate" : ISODate("2015-06-10T16:12:44Z")
},
{
"_id" : 4,
"name" : "primary2.com:27028",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 674846,
"optime" : Timestamp(1441286777, 4),
"optimeDate" : ISODate("2015-09-03T13:26:17Z"),
"lastHeartbeat" : ISODate("2015-09-03T13:26:18Z"),
"lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
"pingMs" : 53,
"syncingTo" : "primary1.com:27028"
},
{
"_id" : 5,
"name" : "failover1.com:27028",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 8629159,
"optime" : Timestamp(1441286778, 1),
"optimeDate" : ISODate("2015-09-03T13:26:18Z"),
"self" : true
}
],
"ok" : 1
}
shard1:SECONDARY> rs.conf()
{
"_id" : "shard1",
"version" : 15,
"members" : [
{
"_id" : 3,
"host" : "primary1.com:27028",
"tags" : {
"dc" : "primary"
}
},
{
"_id" : 4,
"host" : "primary2.com:27028",
"tags" : {
"dc" : "primary"
}
},
{
"_id" : 5,
"host" : "failover1.com:27028",
"priority" : 0,
"tags" : {
"dc" : "failover"
}
}
],
"settings" : {
"getLastErrorModes" : {"ACKNOWLEDGED" : {}}
}
}
The Problem:
The problem is that requests which hit this mongos in our secondary datacenter seem to be getting routed to a replica running in our primary datacenter, not the nearest node, which is running in the secondary datacenter. This incurs a significant amount of network latency and results in bad read performance.
My understanding is that the mongos is deciding which node in the replica set to route the request to, and it's supposed to honor the ReadPreference from my java driver's request. Is there a command I can run in the mongos shell to see the status of the replica set, including ping times to nodes? Or some way to see logging of incoming requests which indicates the node in the replicaSet that was chosen and why? Any advice at all on how to diagnose the root cause of my issue?
While configuring read preference, when ReadPreference = NEAREST the system does not look for minimum network latency as it may decide primary as the nearest, if the network connection is proper. However, the nearest read mode, when combined with a tag set, selects the matching member with the lowest network latency. Even nearest may be any of primary or secondary. Behaviour of mongos when preferences configured , and in terms of network latency is not so clearly explained in the official docs.
http://docs.mongodb.org/manual/core/read-preference/#replica-set-read-preference-tag-sets
hope this helps
If I start mongos with flag -vvvv (4x verbose) then I am presented with request routing information in the log files, including information about the read preference used and the host to which requests were routed. for example:
2015-09-10T17:17:28.020+0000 [conn3] dbclient_rs say
using secondary or tagged node selection in shard1,
read pref is { pref: "nearest", tags: [ {} ] }
(primary : primary1.com:27028,
lastTagged : failover1.com:27028)
Despite the wording, when using nearest, the absolute fastest member isn't necessarily the one chosen. Instead, a random member is chosen out of a pool of members that have a latency lower than the calculated latency window.
The latency window is calculated by taking the fastest member's ping and adding replication.localPingThresholdMs, whose default is 15ms. You can read more about the algorithm here.
So what I do is I combine nearest with tags so that I can specify the member manually that I know is geographically closest.

MongoDB Sharding Policy

I am trying to understand the following behavior displayed by my sharding setup. The data seems to only increase on a single shard as I continuously add data. How does MongoDB shard or distribute data across different servers? Am I doing this correctly? MongoDB version 2.4.1 used here on OS X 10.5.
As requested, sh.status() as follows:
mongos> sh.status()
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("52787cc2c10fcbb58607b07f") }
shards:
{ "_id" : "shard0000", "host" : "xx.xx.xx.xxx:xxxxx" }
{ "_id" : "shard0001", "host" : "xx.xx.xx.xxx:xxxxx" }
{ "_id" : "shard0002", "host" : "xx.xx.xx.xxx:xxxxx" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "newdb", "partitioned" : true, "primary" : "shard0001" }
newdb.prov
shard key: { "_id" : 1, "jobID" : 1, "user" : 1 }
chunks:
shard0000 43
shard0001 50
shard0002 43
Looks like you have chosen a very poor shard key. You partitioned along the values of { "_id" : 1, "jobID" : 1, "user" : 1 } - this will not be a good distribution for inserts since _id values are monotonically increasing since you are using ObjectId() values for _id.
You want to select a shard key that represents how you access the data - it doesn't make sense that you have two more fields after _id - since _id is unique the other two fields are never going to be used to partition the data.
Did you perhaps intend to shard on jobID, user? It's hard to know what the best shard key would be in your case, but it's clear that all the inserts are going into the highest chunk (top value through maxKey) since every new _id is a higher value than the previous one.
Eventually they should be balanced to other shards, but only if the balancer is running, if all your config servers are up and if secondaries are caught up. Best to pick a better shard key and have inserts be distributed evenly across the cluster from the start.

How to Verify Sharding?

I am trying to shard MongoDB. I am done with Sharding configuration, but I am not sure how to verify if sharding is functional.
How do i check whether my data is get sharded? Is there a query to verify/validate the shards?
You can also execute a simple command on your mongos router :
> use admin
> db.printShardingStatus();
which should output informations about your shards, your sharded dbs and your sharded collection as mentioned in the mongodb documentation
sharding version: { "_id" : 1, "version" : 2 }
shards:
{ "_id" : ObjectId("4bd9ae3e0a2e26420e556876"), "host" : "localhost:30001" }
{ "_id" : ObjectId("4bd9ae420a2e26420e556877"), "host" : "localhost:30002" }
{ "_id" : ObjectId("4bd9ae460a2e26420e556878"), "host" : "localhost:30003" }
databases:
{ "name" : "admin", "partitioned" : false,
"primary" : "localhost:20001",
"_id" : ObjectId("4bd9add2c0302e394c6844b6") }
my chunks
{ "name" : "foo", "partitioned" : true,
"primary" : "localhost:30002",
"sharded" : { "foo.foo" : { "key" : { "_id" : 1 }, "unique" : false } },
"_id" : ObjectId("4bd9ae60c0302e394c6844b7") }
my chunks
foo.foo { "_id" : { $minKey : 1 } } -->> { "_id" : { $maxKey : 1 } }
on : localhost:30002 { "t" : 1272557259000, "i" : 1 }
MongoDB has detailed documentation on Sharding here ...
http://www.mongodb.org/display/DOCS/Sharding+Introduction
To anwser you question (I think), see the portion on the config Servers ...
Each config server has a complete copy
of all chunk information. A two-phase
commit is used to ensure the
consistency of the configuration data
among the config servers.
Basically, it is the config server's job to make sure everything get sharded ... correctly.
Also, there are system collections you can query ...
db.runCommand( { listshards : 1 } );
Lots of help in the prez below too ...
http://www.slideshare.net/mongodb/mongodb-sharding-internals
http://www.10gen.com/video/mongosv2010/sharding
If you just want to check whether you are conencted to a sharded cluster or not:
db.isMaster() can be used to detect that you are connected to a sharding router (mongos).
If db.isMaster().msg is "isdbgrid", you are connected to a sharded instance.
db.isMaster() can be run without authentication.
For checking the details of the shards, sh.status() also works, which has the same output as db.printShardingStatus(); works.