MongoDB: Cloning single server dev. database to production shard cluster - mongodb

i was playing around on our dev server for a while for a new product and now it's sorta live and i want to move existing data from a single machine (mongod, local) to our 6 server shard setup (2 shards each a 3 replica set) - is there any way to clone the db to a remote shard?
(worst case, a simple dump & insert with shard key example would be very nice!)
thanks!

You should add your dev server to the sharding environement :
Restart your dev server with the --shard option
on your mongos : type db.runCommand( { addshard : "serverhostname[:port]", name : "migration" } );
use the remove shard command to remove your shard "migration".
When it is done (you get "remove shard completed successfully"), you can stop your dev server and all your data have been migrated from dev to the new cluster
You don't have to shard your database for the migration, however you need to do it if you want to benefit from sharding.
The advantages of this solution is that you have minimum action to take ( everything is automatic) and there is no downtime (the more load you put, however, the slower the operation is)
However this is a slow solution (slower than a manual copy).
One more advantage compared to raw files copy : the transfer will also compact (~ defrag) data, which is always good :-)

Add your dev server to a replica-set as the master, and the other 3 servers as slaves.
Then remove the dev-server once the data was copied by the other servers.
http://www.mongodb.org/display/DOCS/Replica+Set+Commands
You can use mongodump to dump out the database, and then load the db-dump with mongorestore onto the master of each of your replica-sets
man mongodump , man mongorestore
See:
http://www.mongodb.org/display/DOCS/Replica+Set+Internals
http://www.mongodb.org/display/DOCS/Sharding+Introduction
http://www.mongodb.org/display/DOCS/Master+Slave

Related

Mongodb Filesystem backup and restore

I am taking backup of MongoDB filesystem backup(including config files).
We are not using sharding in our cluster, having 3node replicaset in place.
Primary Cluster: X_host1, X_host2, X_host3
Secondary Cluster: Y_host1, Y_host2, Y_host3
Taking filesystem backup from X_host1 and restoring it to Y_Host1,2,3 (restoring to diff hostname)
So, how re-configure MongoDB to follow new hostnames? I see the replication nodes are configured into the DB (not any editable config files).
Is this the right approach to migrate data from replicated mongodb cluster?
Is this the right approach to migrate MongoDB cluster to new hostnames?
Is there any way to re-configure new hostnames.
AFAIK, after I restore filesystem to new nodes
Data is from old nodes, having info about old replica nodes.(X_hosts)
How to point it to Y_hosts
Follow Restore a Replica Set from MongoDB Backups
In principle do this:
Restore the backup on new host (just one)
Start the MongoDB as stand-alone, connect to it and drop the local database:
db.getSiblingDB('local').dropDatabase()
Initiate the ReplicaSet: rs.initiate()
Add all members to the ReplicaSet. An initial sync is triggered.
If your database is large, initial sync can take a long time to complete. For large databases, it might be preferable to copy the database files onto each host. For details have a look at linked tutorial.

Restoration of outdated config server

We have 1 test mongodb cluster that includes
1 mongos servers
3 config servers
6 shards
Q1. We have tried to restore a outdated config server backup. We can only find that the config.chunks have less records than before but we can query and insert/update data in the mongodb. What will be the worst result if we use an outdated config server backup ?
Q2. Is there any tools that can re-build the loss records in config server with the existing data in each shard ?
Answer to Q1
With outdated config server contents, iirc, there may be an unnoticed gigantic loss of data. Here is why:
Sharding in MongoDB is based on key ranges. That is, each shard is assigned a range of the shard keys it is responsible for.
For illustration purposes, let's assume you have a shard key of integer numbers starting from 1 to infinity. So the key ranges could look like this (exclusive the boundaries)
shard0001: -infinity to 100
shard0002: 101 - 200
shard0003: 201 - 300
shard0004: 301 - 400
shard0005: 401 - 500
shard0006: 501 - 600
So how does you mongo know about this distribution? It is stored on the config servers. Now let's assume that your metadata has changed and your shard0002 actually holds the data from 100-500. Let's assume you want to retrieve the document with the shard key 450. According to the old metadata, this document has to be on shard0005, if it exists. So the query gets routed to shard0005. An index lookup will be done and the shard finds out that it does not have the document. So while the document exists (on shard0002), due to the outdated metadata it will be looked up on shard0005, where it does not exist.
Answer to Q2
Not as far as I know. What you could do, however is to use the following procedure for MongoDB < 3.0.0.
Disclaimer
I haven't tested this procedure. Make sure you have the backups ready before wiping the data directories and do not omit the --repair and --objcheck flags
For maximum security, create filesystem snapshots before using it.
If you don't, please do not blame me for any data loss.
Shut down the whole cluster gracefully
Use mongodump against the data directory
mongodump --repair --dbpath /path/to/your/datafiles -o /path/for/backups/mongo
Do this once for each shard.
Wipe all data directories and recreate your sharded cluster
Connect to a mongos
sh.enableSharding({"database":yourDb})
sh.shardCollection("yourdDb.yourShardedCollection",{"yourShardKey":1})
From each shard, use mongorestore to write the backups to a mongos
mongorestore -h mongosHost:mongosPort --db yourDb --dir /path/for/backups/ \
--objcheck --write-concern "{w:1}"
Note that you should NOT do the restores in parallel, since this might well overload the balancer.
What we basically do is to gather all data from the individual shards, create a new sharded collection within a new database and put the collected data into that database, with the sharded collection being automatically balanced.
Watch the process very carefully and make absolutely positively sure that you do not overload the balancer, otherwise you might run out of disk space on a shard in case you do not have an optimal shard key.
Of course, you can recreate other sharded databases from the backup by using mongorestore accordingly. To restore unsharded databases, simply connect to the replicaset you want to hold the collection instead of connecting to mongos.
Side note:
If you need to restore a config server, simply dump one of the other two and restore the config database to that server.
The reason this works is because the metadata can not be updated unless all config servers are up, running and in sync.

Setting up distributed MongoDB with 4 servers

I am supposed to setup mongodb on 4 servers (a school project to benchmark mongodb, vs others). So I am thinking of using only sharding without replication. So I attempted to setup the 1st 3 server to run mongod --configsvr, and the 4th just a normal mongod instance. Then for all servers, I run mongos. Then I am at the part where I run sh.addShard("...") and I get
{
"ok" : 0,
"errmsg" : "the specified mongod is a --configsvr and should thus not be a shard server"
}
Seems like I cant have a config server running as a shard too? How then should I set things up?
No, the config server is not a shard. The config server is a special mongod instance that stores the sharded cluster metadata, essentially how the data is distributed across the sharded cluster.
You really only need 1 mongos instance for a simple setup like this. When you start mongos, you will provide it with the hostname(s) of your config server(s).
The Sharded Cluster Deployment Tutorial explains all of the steps that you need to follow.

MongoDB replica set to stand alone backup and restore

For development reasons, I need to backup a production replica set mongodb and restore it on a stand alone, different machine test instance.
Some docs are talking about the opposite ( standalone 2 replica-set ), but I cannot find his downgrade/rollback way.
What's the way to go, in this case ?
No matter how many nodes you have in a replica set, each of them holds the same data.
So getting the data is easy - just use mongodump (preferably against the secondary, for performance reasons) and then mongorestore into a new mongod for your development stand-alone system.
mongodump does not pick up any replication related collections (they live in database called local). If you end up taking a file system snapshot of a replica node rather than using mongodump, be sure to drop the local database when you restore the snapshot into your production stand-alone server and then restart mongod so that it will properly detect that it is not part of a replica set.

Remove inaccessible Mongo shard

I have a MongoDB sharded setup with 3 shards: shard0000, shard0001 and shard0002. The machine that runs shard0002 is down now, which causes all my queries to fail. I'd like to temporarily remove shard0002 from my setup and keep working with the first two shards. That should be doable assuming I only use unsharded collections that reside in the first two shards, right?
What I tried first is: db.runCommand({removeshard: 'IP:PORT'}) which obviously doesn't help, because it just puts the shard in draining mode, which will never end (since it's down). Then I tried connected to my config server and did db.shards.remove({_id: 'shard0002'}) on the config DB then restart mongos so it reloads the config. Now whenever I try to do anything I get "can't find shard for: shard0002".
Is there any way to just let Mongo know that I don't care about that shard for now, then re-enable it later when it becomes available.
I had a different problem, and I manually removed the shard with:
use config
db.shards.remove({"_id":"shard0002"});
Manually modify the the shard entry in the config db, then removeshard
I tried several options to do this in version 4.2.
At the end I ended to these commands to be executed on Config Server:
use config
db.databases.updateMany( {primary: "shard0002"}, {$set: {primary: "shard0000"} })
db.shards.deleteOne({_id : "shard0002" })
db.chunks.updateMany( {shard : "shard0002"}, {$set: {shard: "shard0000"} })
while ( db.chunks.updateMany( {"history.shard" : "shard0002"},
{$set: {"history.$.shard": "shard0000"} }).modifiedCount > 0 ) { print("Updated") }
It works to a certain extent, i.e. CRUD operations are working. However, when you run getShardDistribution() then you get an error Collection 'db.collection' is not sharded.
Finally I see only one reliable & secure solution:
Shut down all mongod and mongos in your sharded cluster
Start available shards as standalone service (see Perform Maintenance on Replica Set Members)
Take a backup from available shards with mongodump.
Drop data folders from all hosts.
Build your application newly from scratch. Startup all mongod and mongos
Load data into new cluster with mongorestore
Perhaps for large cluster you have to shuffle around a bit like this:
Deploy Config servers and mongos server, with one empty shard
Start one old shard as standalone
Take backup from this old Shard
Tear down this old shard
build a fresh empty new shard
add new shard to your new cluster
restore data into the new cluster
backup can be dropped and shard can be reused in new cluster
Repeat above for each shard you have in your cluster (the broken shard might be skipped most likely)