Sharded mongodb cluster use only one shard - mongodb

I'm trying to create a sharded mongodb cluster, I have 3 VM to use and I was trying to get the following setup:
VM1) Node.js Application + mongos instance
VM2) CSRS PRIMARY + Shard
VM3) CSRS SECONDARY + Shard
I'm not interested in replication but I need sharding because I will have to query over large number of results.
To achieve what stated above I have done the following:
On both VM2 and VM3
mongod –configsvr --replSet configM4C –bind_ip <ipVM2/3> --port 27040
mongod –shardsvr --replSet shardM4C --bind_ip <ipVM2/3> --port 27041
Then, connected to the configM4C on VM2(mongo --host --port 27040) and run
rs.initiate({
_id:”configM4C”,
configsvr: true,
members: [
{_id: 0, host: “<ipVM2>:27040”},
{_id:1, host: “<ipVM3>:27040”}
]
})
Then, connected to shardM4C on VM2(mongo --host --port 27041) and run
rs.initiate({
_id: shardM4C,
members: [
{_id: 0, host: “<ip1>:27041”},
{_id:1, host: “<ip2>:27041”}
]
})
Started mongos on VM1
mongos –configdb configM4C/<ipVM2>:27040,<ipVM3>:27040 --port 27042
Connected to mongos on VM1(mongo --port 27042):
sh.addShard("<ipVM2>:27041)
sh.addShard("<ipVM3>:27041)
sh.enableSharding("pings")
sh.shardCollection("pings.pings", {provider:1, from_zone: 1, to_zone: 1})
Anyway at this point if I run db.pings.getShardDistribution() I get:
Shard shardM4C at shardM4C/<ipVM3>:27041,<ipVM2>:27041
data : 19.47MiB docs : 102876 chunks : 1
estimated data per chunk : 19.47MiB
estimated docs per chunk : 102876
Totals
data : 19.47MiB docs : 102876 chunks : 1
Shard shardM4C contains 100% data, 100% docs in cluster, avg obj size on shard : 198B
sh.status():
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5b58a6b4986cc2d94694a3f9")
}
shards:
{ "_id" : "shardM4C", "host" : "shardM4C/10.0.0.4:27041,10.0.0.7:27041", "state" : 1 }
active mongoses:
"3.6.3" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 5
Last reported error: Could not find host matching read preference { mode: "primary" } for set shardM4C
Time of Reported error: Wed Jul 25 2018 21:44:24 GMT+0000 (UTC)
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
shardM4C 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shardM4C Timestamp(1, 0)
{ "_id" : "pings", "primary" : "shardM4C", "partitioned" : true }
pings.pings
shard key: { "provider" : 1 }
unique: false
balancing: true
chunks:
shardM4C 1
{ "provider" : { "$minKey" : 1 } } -->> { "provider" : { "$maxKey" : 1 } } on : shardM4C Timestamp(1, 0)
{ "_id" : "test", "primary" : "shardM4C", "partitioned" : false }
And if I run a query in the executionStats I get:
...
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SINGLE_SHARD",
"shards" : [
{
"shardName" : "shardM4C",
"connectionString" : "shardM4C/<ipVM3>:27041,<ipVM2>:27041",
"serverInfo" : {
"host" : "dpiscaglia-2",
"port" : 27041,
"version" : "3.6.3",
"gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5"
},
...
Both results makes me think that the query is actually working on only one shard, and the second is used only for replication and as a backup(which is not what i want), am i correct?
If yes could you please help me understand what I did wrong?
Thank you very much in advance for your help
Have a good day

Related

Mongos instance can't communicate with the database

So I have a sharded cluster with 2 config servers, 2 shards each with 2 replicas and 2 mongos instances, everything running on different VMs.
However, after configuring all of it, I finally tried to interact with the database which is empty with a simple show dbs query from the mongos instance, but it threw me the following error (after thinking for like 1 min):
uncaught exception: Error: listDatabases failed:{
"ok" : 0,
"errmsg" : "Could not find host matching read preference { mode: \"primary\" } for set rep",
"code" : 133,
"codeName" : "FailedToSatisfyReadPreference",
"operationTime" : Timestamp(1648722327, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1648722327, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
Everything seems to be well configured and when I do sh.status() from the mongos instance it identifies the shards and replicas as such:
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("62421dd6b5f9640f309faca0")
}
shards:
{ "_id" : "rep", "host" : "rep/192.168.86.136:26000,192.168.86.141:26001", "state" : 1 }
{ "_id" : "repb", "host" : "repb/192.168.86.142:26002,192.168.86.143:26003", "state" : 1 }
active mongoses:
"4.4.8" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 5
Last reported error: Empty host component parsing HostAndPort from ""
Time of Reported error: Thu Mar 31 2022 11:06:39 GMT+0100 (WEST)
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
rep 919
repb 105
too many chunks to print, use verbose if you want to force print
{ "_id" : "testdb", "primary" : "rep", "partitioned" : false, "version" : { "uuid" : UUID("2e584dcd-25ea-4ba4-805c-b40928e26511"), "lastMod" : 1 } }
Maybe a firewall issue.
Every node in your cluster must be able to reach any other node via according port. See
Simple HTTP/TCP health check for MongoDB
Try this script to check each member of each replica set:
const MONGO_PASSWROD = '*******'
const AUTH_SOURCE = 'admin'
const user = db.runCommand({ connectionStatus: 1 }).authInfo.authenticatedUsers.shift().user;
const map = db.adminCommand("getShardMap").map;
for (let rs of Object.keys(map)) {
let uri = map[rs].split("/");
let connectionString = `mongodb://${user}:${MONGO_PASSWROD}#${uri[1]}/admin?replicaSet=${uri[0]}&authSource=${AUTH_SOURCE}`;
let replicaSet = Mongo(connectionString).getDB("admin");
for (let member of replicaSet.adminCommand({ replSetGetStatus: 1 }).members) {
if (!replicaSet.hello().hosts.includes(member.name)) continue;
printjsononeline({ replicaSet: rs, host: member.name, stateStr: member.stateStr, health: member.health });
if (member.health != 1 || !Array("PRIMARY", "SECONDARY").includes(member.stateStr))
print(`Member state of ${member.name} is '${member.stateStr}'`);
}
}
Turns out I configured the replica set wrongly, so all I had to do was recreate the volumes of all VMs and configure it all again from scratch. Now it works as it should.

is it possible to make multiple mongo routers connected to just one shard on a mongoDB shard cluster?

I'd like to make a mongo cluster with multiple mongo routers(mongos) with just one shard(mongod) like this figure.
So I made two mongo routers named 'mongorouter-1', 'mongorouter-2', and also made one shard named 'mongod'.
In mongorouter-1 I added 'mongod' well with this command.
sh.addShard("mongod:27017")
It works well, but In mongorouter-2 this command put an error, like
mongos> sh.addShard("mongod:27017")
{
"ok" : 0,
"errmsg" : "E11000 duplicate key error collection: admin.system.version index: _id_ dup key: { : \"shardIdentity\" }",
"code" : 11000,
"codeName" : "DuplicateKey",
"operationTime" : Timestamp(1558591937, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1558591937, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
In mongorouter-1, sh.status is this
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5ce6683cc490bfc9325389cb")
}
shards:
{ "_id" : "shard0000", "host" : "mongod:27017", "state" : 1 }
active mongoses:
"4.0.6" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
and in mongorouter-2, sh.status is this
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5ce668176a4dcc52fd230ac9")
}
shards:
active mongoses:
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
I don't know how to make multiple mongo routers connected to just one shard.
If you know the solution, help me. Thanks in advance.

MongoDB Sharding - why my data isn't being distributed on all shards?

I created a MongoDB Cluster with 3 shards, each shards contain 3 mongod-processes. My Cluster also contains 3 mongos and 3 config servers
In the connection string I put the 3 mongos
mongodb://user:pass#mongos1:27017,mongos2:27017,mongos3:27017/mydatabase
In the picture you can see that dbShard_2 has 1.19GB of data while the others are almost empty with only 4KB. But on the charts you can see that there is also read/write operation on all shards. Is everything fine or did I have made some wrong configurations? shall I worry?
I left the Cloud Manager do the whole configuration for me, I didn't set these by myself manually.
Here you can check my sharding status
mongos> db.printShardingStatus();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("XXXXX")
}
shards:
{ "_id" : "dbShard_0", "host" : "dbShard_0/dbnode-0.x-app.com:27000,dbnode-1.x-app.com:27000,dbnode-2.x-app.com:27000" }
{ "_id" : "dbShard_1", "host" : "dbShard_1/dbnode-0.x-app.com:27001,dbnode-2.x-app.com:27001,dbnode-2.x-app.com:27002" }
{ "_id" : "dbShard_2", "host" : "dbShard_2/dbnode-0.x-app.com:27002,dbnode-1.x-app.com:27001,dbnode-2.x-app.com:27003" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydatabase-staging", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "mydatabase", "partitioned" : false, "primary" : "dbShard_2" }
{ "_id" : "test", "partitioned" : false, "primary" : "dbShard_0" }
Cluster

Sharding in a replicaset MongoDB

I have mongoDb replica set , One primary one secondary and an arbiter to vote. I'm planning to implement sharding as the data is expected to grow exponentially.I find difficult in following mongoDb document for sharding. Could someone explain it clearly to set it up. Thanks in advance.
If you could do accomplish replicaset, sharding is pretty simple. Pretty much repeating the mongo documentation in fast forward here:
Below is for a sample setup: 3 configDB and 3 shards
For the below example you can run all of it one machine to see it all working.
If you need three shards setup three replica sets. (Assuming 3 Primary's are 127:0.0.1:27000, 127.0.0.1:37000, 127.0.0.1:47000)
Run 3 instances mongod as three config servers. (Assuming: 127.0.0.1:27020, 127.0.0.1:27021, 127.0.0.1:270122)
Start mongos (note the s in mongos) letting it know where your config servers are. (ex: 127.0.0.1:27023)
Connect to mongos from mongo shell and add the three primary mongod's of your 3 replica sets as the shards.
Enable sharding for your DB.
If required enable sharding for a collection.
Select a shard key if required. (Very Important you do it right the first time!!!)
Check the shard status
Pump data; connect to individual mongod primarys and see the data distributed across the three shards.
#start mongos with three configs:
mongos --port 27023 --configdb localhost:27017,localhost:27018,localhost:27019
mongos> sh.addShard("127.0.0.1:27000");
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("127.0.0.1:37000");
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.addShard("127.0.0.1:47000");
{ "shardAdded" : "shard0002", "ok" : 1 }
mongos> sh.enableSharding("db_to_shard");
{ "ok" : 1 }
mongos> use db_to_shard;
switched to db db_to_shard
mongos>
mongos> sh.shardCollection("db_to_shard.coll_to_shard", {collId: 1, createdDate: 1} );
{ "collectionsharded" : "db_to_shard.coll_to_shard", "ok" : 1 }
mongos> show databases;
admin (empty)
config 0.063GB
db_to_shard 0.078GB
mongos> sh.status();
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("557003eb4a4e61bb2ea0555b")
}
shards:
{ "_id" : "shard0000", "host" : "127.0.0.1:27000" }
{ "_id" : "shard0001", "host" : "127.0.0.1:37000" }
{ "_id" : "shard0002", "host" : "127.0.0.1:47000" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
{ "_id" : "db_to_shard", "partitioned" : true, "primary" : "shard0000" }
db_to_shard.coll_to_shard
shard key: { "collId" : 1, "createdDate" : 1 }
chunks:
shard0000 1
{ "collId" : { "$minKey" : 1 }, "createdDate" : { "$minKey" : 1 } } -->> { "collId" : { "$maxKey" : 1 }, "createdDate" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)

MongoDB replicates data to all shards

I have 4 servers in a test environment which I use to test MongoDB replica and distribution:
RepSetA holds RepSetA1 and RepSetA2.
RepSetB holds RepSetB1 and RepSetB2.
All servers act as routers, RepSetA1 acts as a single config server.
I have a "Player" data (10,000 records, the object consists an "Id" and a "Name" fields), and I want it to be sharded (or distributed) between the replica sets, and cloned among the servers in the same replica set. So, just for a plain example:
Player1-5000: Exists in both RepSetA1 and RepSetA2, but not in RepSetB1 and RepSetB2.
Player5000-10000: Exists in both RepSetB1 and RepSetB2, but not in RepSetA1 and RepSetA2.
What I get instead is having all players in all 4 servers.
If I print the sharding status, I get the following:
mongos> db.printShardingStatus();
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "RepSetA", "host" : "RepSetA/MongoRepSetA1:27018,MongoRepSetA2:27018" }
{ "_id" : "RepSetB", "host" : "RepSetB/MongoRepSetB1:27018,MongoRepSetB2:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "GamesDB", "partitioned" : true, "primary" : "RepSetA" }
GamesDB.Player chunks:
RepSetA 2
{ "_id" : { $minKey : 1 } } -->> { "_id" : 0 } on : RepSetA { "t" : 1000, "i" : 1 }
{ "_id" : 0 } -->> { "_id" : { $maxKey : 1 } } on : RepSetA { "t" : 1000, "i" : 2 }
{ "_id" : "test", "partitioned" : false, "primary" : "RepSetB" }
{ "_id" : "EOO", "partitioned" : false, "primary" : "RepSetB" }
I used the following commands to build the shards:
db.adminCommand( { addShard : "RepSetA/MongoRepSetA1:27018,MongoRepSetA2:27018" } )
db.adminCommand( { addShard : "RepSetB/MongoRepSetB1:27018,MongoRepSetB2:27018" } )
db.runCommand( { enablesharding : "GamesDB" } );
db.runCommand( { shardcollection : "GamesDB.Player", key : { _id :1 } , unique : true} );
What am I doing wrong?
If you connect through a mongos process to your nodes, it will look like all contain the data. From your output, It doesn't look like all data is available on all nodes. RepSetA holds 2 chunks and RepSetB should contain none. You can verify this by connecting directly with your nodes rather than through mongos.
By the way, if you're using MongoDBs ObjectId as your _id(shard key), consider to shard on another key as this will cause all inserts be made into one node as the key changes monoton.
This is fine. It does not show that all data is on all servers. The output shows that all chunks (data) of GamesDB.Player are on shard RepSetA
GamesDB.Player chunks:
RepSetA 2
{ "_id" : { $minKey : 1 } } -->> { "_id" : 0 } on : RepSetA { "t" : 1000, "i" : 1 }
{ "_id" : 0 } -->> { "_id" : { $maxKey : 1 } } on : RepSetA { "t" : 1000, "i" : 2 }
This means that the balancer has not started to balance your chunks. The balancer only kicks in when there is an 8 chunk difference.
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing
You can force balancing by manually splitting chunks (if you want to)
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks
Or you can reduce the chunk size if you want to see balancing sooner.
http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-ChunkSizeConsiderations