how to get results from different shards and same collection - mongodb

I created a test MongoDB Sharding with 1 Config Server, 1 Router and 2 Shards.
I created a config server with
mongod --configsvr --dbpath /data/configdb --port 27019
And I run mongos with mongos --configdb <CONFIG SERVER IP>:27019. I connected shards and database and collections. When I query sharded collection, cursor doesn't get results from two shards, it gets results only 1 shard. According to MongoDB Sharding Document, I can distribute a collection to different shards but results didn't satisfy this information.
What am I doing wrong? What's the correct way to get results from different shards with same query?

The correct way is to ensure that you do NOT have to go cross shard, often. You chose a shard key so that your queries for not end up going across shards. e.g. a user database could have user id/name as the shard key. Most queries go looking for a single user, so would come back from single shard.
For queries that cannot be satisfied with results from 1 shard alone, Mongo automatically requests data from all shards.

Related

Import documents to a particular/targeted shard in Mongo Shard Cluster

I'm new to MongoDB, however I have setup a shard cluster with 3 replica sets. Each replica set has 2 mongod instances. I have a separate config server and 1 mongos instance.
I wanted to try whether it's possible to import documents to a particular/targeted shard, when the collection is not shard enabled. (As far as I know, we can't control in which shard the collection will get saved when importing is done via mongos instance)
Hence, I imported the documents using below command, and it was successful (followed this);
mongoimport --db NormalTestDB --collection TestDams --host <replSetName>/<hostname1><:port>,<hostname2><:port> --jsonArray --file “<path_to_json>”
(I used a particular replica set name in <replSetName>)
Now when I try this simple query db.TestDams.find().count() in mongos shell (by connecting to NormalTestDB), it returns 0. But if I try the same query by directly connecting to the Primary of the relevant replicaSet, I get 14603.
Is there any way that I could sync? or is there a way to use mongoimport targeting a particular shard??
(Aside: You should generally have 3 nodes in a replica set rather than 2.)
When you enable sharding on a database you can specify the primary shard. Unsharded collections will be stored on that shard. So, you can define one database per shard with each configured to have a distinct shard as the primary shard.
Alternatively if you set up zones you can specify which values of shard keys are to be stored on each shard. Then by inserting appropriate values you'd have the data stored on the shards you want.
You should be performing all operations on your sharded cluster through one of the mongos nodes, not like you have done writing directly to shard mongods.

How to create mongodb oplog for database?

Recently I purchased cluster account from MongoDB atlas. I had checked UI and MongoDB documentation but don't know create oplog db. And how to create a connection string for oplog?
Oplog collection is automatically created and maintained by mongoDB and is created under the local database. It is used internally by mongoDB to maintain the various nodes in sync in a replica set. Assume you have a three node replica set, one primary and two secondary. When a write operation happens in the primary, the event is logged in oplog collection and is used internally to replicate the same change in the secondary nodes.
So to use oplog you will need a MongoDB server configured as a replica set. Standalone instances of mongoDB do not have oplog as there is no need for replication in this case.
However, you may start a standalone instance as a replica set by using the command,
Start Mongo server in replica mode.
mongod --dbPath <path_to_data_file> --replSet rs0
Initiate the replica set.(Execute from mongo shell)
rs.initiate();
oplog is a mongodb replication feature. If you have choose replication cluster, you have it in the database named "local". See the collections in "local" database.

Restoration of outdated config server

We have 1 test mongodb cluster that includes
1 mongos servers
3 config servers
6 shards
Q1. We have tried to restore a outdated config server backup. We can only find that the config.chunks have less records than before but we can query and insert/update data in the mongodb. What will be the worst result if we use an outdated config server backup ?
Q2. Is there any tools that can re-build the loss records in config server with the existing data in each shard ?
Answer to Q1
With outdated config server contents, iirc, there may be an unnoticed gigantic loss of data. Here is why:
Sharding in MongoDB is based on key ranges. That is, each shard is assigned a range of the shard keys it is responsible for.
For illustration purposes, let's assume you have a shard key of integer numbers starting from 1 to infinity. So the key ranges could look like this (exclusive the boundaries)
shard0001: -infinity to 100
shard0002: 101 - 200
shard0003: 201 - 300
shard0004: 301 - 400
shard0005: 401 - 500
shard0006: 501 - 600
So how does you mongo know about this distribution? It is stored on the config servers. Now let's assume that your metadata has changed and your shard0002 actually holds the data from 100-500. Let's assume you want to retrieve the document with the shard key 450. According to the old metadata, this document has to be on shard0005, if it exists. So the query gets routed to shard0005. An index lookup will be done and the shard finds out that it does not have the document. So while the document exists (on shard0002), due to the outdated metadata it will be looked up on shard0005, where it does not exist.
Answer to Q2
Not as far as I know. What you could do, however is to use the following procedure for MongoDB < 3.0.0.
Disclaimer
I haven't tested this procedure. Make sure you have the backups ready before wiping the data directories and do not omit the --repair and --objcheck flags
For maximum security, create filesystem snapshots before using it.
If you don't, please do not blame me for any data loss.
Shut down the whole cluster gracefully
Use mongodump against the data directory
mongodump --repair --dbpath /path/to/your/datafiles -o /path/for/backups/mongo
Do this once for each shard.
Wipe all data directories and recreate your sharded cluster
Connect to a mongos
sh.enableSharding({"database":yourDb})
sh.shardCollection("yourdDb.yourShardedCollection",{"yourShardKey":1})
From each shard, use mongorestore to write the backups to a mongos
mongorestore -h mongosHost:mongosPort --db yourDb --dir /path/for/backups/ \
--objcheck --write-concern "{w:1}"
Note that you should NOT do the restores in parallel, since this might well overload the balancer.
What we basically do is to gather all data from the individual shards, create a new sharded collection within a new database and put the collected data into that database, with the sharded collection being automatically balanced.
Watch the process very carefully and make absolutely positively sure that you do not overload the balancer, otherwise you might run out of disk space on a shard in case you do not have an optimal shard key.
Of course, you can recreate other sharded databases from the backup by using mongorestore accordingly. To restore unsharded databases, simply connect to the replicaset you want to hold the collection instead of connecting to mongos.
Side note:
If you need to restore a config server, simply dump one of the other two and restore the config database to that server.
The reason this works is because the metadata can not be updated unless all config servers are up, running and in sync.

Shard Existing Collection - MongoDB

I have a mongo collection on farm1-server1 and I managed to replicate it to farm2-server1 - the db path is /db/data. farm2-server1 is a part of 3 servers and I want to shard the collection I just replicated between all 3 servers. In order to do that, I stopped replication on server1, started mongod (on port 27017) and pointed it to the collection I replicated (/db/data) - I also added the directive:
configsvr = true.
I started mongos and added the following directive
configdb = server1:27017
Then I started the shard processes (mongod) on each one of the server 1-3 with the directive:
shardsvr = true
I expected the collection to be sharded, but what happens is that the old collection I replicated is not recognized in this configuration, hence it cannot be sharded.
I have read that existing collections can be sharded so I must be doing something wrong here. Any help is appreciated. I can provided configuration files is required.
Thanks, Noam
It sounds like you have missed some steps - you need to
add shards
enable sharding
shard the collection
http://www.mongodb.org/display/DOCS/Configuring+Sharding#ConfiguringSharding-ConfiguringtheShardCluster

Remove inaccessible Mongo shard

I have a MongoDB sharded setup with 3 shards: shard0000, shard0001 and shard0002. The machine that runs shard0002 is down now, which causes all my queries to fail. I'd like to temporarily remove shard0002 from my setup and keep working with the first two shards. That should be doable assuming I only use unsharded collections that reside in the first two shards, right?
What I tried first is: db.runCommand({removeshard: 'IP:PORT'}) which obviously doesn't help, because it just puts the shard in draining mode, which will never end (since it's down). Then I tried connected to my config server and did db.shards.remove({_id: 'shard0002'}) on the config DB then restart mongos so it reloads the config. Now whenever I try to do anything I get "can't find shard for: shard0002".
Is there any way to just let Mongo know that I don't care about that shard for now, then re-enable it later when it becomes available.
I had a different problem, and I manually removed the shard with:
use config
db.shards.remove({"_id":"shard0002"});
Manually modify the the shard entry in the config db, then removeshard
I tried several options to do this in version 4.2.
At the end I ended to these commands to be executed on Config Server:
use config
db.databases.updateMany( {primary: "shard0002"}, {$set: {primary: "shard0000"} })
db.shards.deleteOne({_id : "shard0002" })
db.chunks.updateMany( {shard : "shard0002"}, {$set: {shard: "shard0000"} })
while ( db.chunks.updateMany( {"history.shard" : "shard0002"},
{$set: {"history.$.shard": "shard0000"} }).modifiedCount > 0 ) { print("Updated") }
It works to a certain extent, i.e. CRUD operations are working. However, when you run getShardDistribution() then you get an error Collection 'db.collection' is not sharded.
Finally I see only one reliable & secure solution:
Shut down all mongod and mongos in your sharded cluster
Start available shards as standalone service (see Perform Maintenance on Replica Set Members)
Take a backup from available shards with mongodump.
Drop data folders from all hosts.
Build your application newly from scratch. Startup all mongod and mongos
Load data into new cluster with mongorestore
Perhaps for large cluster you have to shuffle around a bit like this:
Deploy Config servers and mongos server, with one empty shard
Start one old shard as standalone
Take backup from this old Shard
Tear down this old shard
build a fresh empty new shard
add new shard to your new cluster
restore data into the new cluster
backup can be dropped and shard can be reused in new cluster
Repeat above for each shard you have in your cluster (the broken shard might be skipped most likely)