MongoDB database listings different between mongos and replica set - mongodb

Please see below my query, I ran [show dbs] on mongos/router server and gives me
different result than running the same command on Replica Set server:
As you can see below, some databases such as [clients] is in RS server but it is not in Mongos server, and vise-versa.. also some similar databases are different in size.. Why ?
I am migrating these databases to AWS, I am confused as to which database(s) I need to migrate ? and what about the size of some of these databases - for instance the DB [ device1 ] is 0.156GB when it is listed from RS server, but it is 0.078GB when it is listed from RS server..
This is actually a production system and I am concerned if something is not right. Where should I check to get accurate information about the database I am migrating to AWS ? and how do I know which DB that is being used and the DB that is not used..
I am an Oracle DBA, I am new to Mongo, I'd appreciate all the advice & suggestions you can provide to get a clear picture about these databases before I can start the migration process. It would also be helpful if I can find a detailed instructions as to how to migrate Replica Sets DBs to a new hardware such as AWS..
**>> From MongoS**
mongos> show dbs
admin (empty)
config 0.063GB
dev-mgt 0.453GB
device1 0.156GB
rtime 0.203GB
zales 83.767GB
site 0.953GB
eedb (empty)
test (empty)
**From RS server**
rs0:PRIMARY> show dbs
admin (empty)
config (empty)
Device1 0.078GB
clients 86.036GB
rtime 0.203GB
zales 83.767GB
test (empty)
Please advice. Thank you !

Not all your collections are sharded. If you have a collection that isn't sharded, and you look for it on your secondary shard, you won't find it. Non-sharded collections will only go to a single replica set. You should always go through your mongos instance to browse your dbs and data.
From documentation:
Primary Shard
Every database has a “primary” 1 shard that holds all the un-sharded
collections in that database.

Related

Mongodb : why show dbs does not show my databases?

I have setup mongodb 64bits on Windows. I ran server and client successfully.
But when I type:
show dbs
Output is
local 0.000GB
Why ? show dbs is supposed to list all databases at least the default one "test"
am I wrong ?
Although you may be in the test database by default, the database does not actually get created until you insert a document into a collection in the database which will implicitly create the collection and the database.

Restoration of outdated config server

We have 1 test mongodb cluster that includes
1 mongos servers
3 config servers
6 shards
Q1. We have tried to restore a outdated config server backup. We can only find that the config.chunks have less records than before but we can query and insert/update data in the mongodb. What will be the worst result if we use an outdated config server backup ?
Q2. Is there any tools that can re-build the loss records in config server with the existing data in each shard ?
Answer to Q1
With outdated config server contents, iirc, there may be an unnoticed gigantic loss of data. Here is why:
Sharding in MongoDB is based on key ranges. That is, each shard is assigned a range of the shard keys it is responsible for.
For illustration purposes, let's assume you have a shard key of integer numbers starting from 1 to infinity. So the key ranges could look like this (exclusive the boundaries)
shard0001: -infinity to 100
shard0002: 101 - 200
shard0003: 201 - 300
shard0004: 301 - 400
shard0005: 401 - 500
shard0006: 501 - 600
So how does you mongo know about this distribution? It is stored on the config servers. Now let's assume that your metadata has changed and your shard0002 actually holds the data from 100-500. Let's assume you want to retrieve the document with the shard key 450. According to the old metadata, this document has to be on shard0005, if it exists. So the query gets routed to shard0005. An index lookup will be done and the shard finds out that it does not have the document. So while the document exists (on shard0002), due to the outdated metadata it will be looked up on shard0005, where it does not exist.
Answer to Q2
Not as far as I know. What you could do, however is to use the following procedure for MongoDB < 3.0.0.
Disclaimer
I haven't tested this procedure. Make sure you have the backups ready before wiping the data directories and do not omit the --repair and --objcheck flags
For maximum security, create filesystem snapshots before using it.
If you don't, please do not blame me for any data loss.
Shut down the whole cluster gracefully
Use mongodump against the data directory
mongodump --repair --dbpath /path/to/your/datafiles -o /path/for/backups/mongo
Do this once for each shard.
Wipe all data directories and recreate your sharded cluster
Connect to a mongos
sh.enableSharding({"database":yourDb})
sh.shardCollection("yourdDb.yourShardedCollection",{"yourShardKey":1})
From each shard, use mongorestore to write the backups to a mongos
mongorestore -h mongosHost:mongosPort --db yourDb --dir /path/for/backups/ \
--objcheck --write-concern "{w:1}"
Note that you should NOT do the restores in parallel, since this might well overload the balancer.
What we basically do is to gather all data from the individual shards, create a new sharded collection within a new database and put the collected data into that database, with the sharded collection being automatically balanced.
Watch the process very carefully and make absolutely positively sure that you do not overload the balancer, otherwise you might run out of disk space on a shard in case you do not have an optimal shard key.
Of course, you can recreate other sharded databases from the backup by using mongorestore accordingly. To restore unsharded databases, simply connect to the replicaset you want to hold the collection instead of connecting to mongos.
Side note:
If you need to restore a config server, simply dump one of the other two and restore the config database to that server.
The reason this works is because the metadata can not be updated unless all config servers are up, running and in sync.

Sharding & Replication in mongodb

First of all, I'm a beginner in mongoDB so please be patient with me. I'm using windows and I created a simple database that contains 4 collections. When I'm dealing with mongoDB, I first run: mongod.exe --dbpath "Path To Data Folder" in a terminal and then I connect to the mongod using mongo.exe. What I'm supposed to do is to distribute the database along with its collections into shards and replica sets for supporting distributed queries.
I tried to use commands like sh.enableSharding("DATABASE NAME") but it didn't work. I then figured out that I need to run mongos instead of mongod, so I followed this: Sharding in MongoDB but unfortunately I didn't succeeded. I also did some research but it seems there is a lack of to-the-point guides on sharding and replication. So if you point me to the right direction, I would really appreciate it.
You can't enable sharding on a single database instance. You need to have at least 3 config server instances, two database (mongod) instances and a router instance (mongos). All of them should be running in the same time (i.e don't close the terminals in which you started all your instances).
A good starting point for you is to read the sharding tutorial in Deploy a Sharded Cluster - MongoDB.org

mongodb replicaset new member does not show the correct disk usage on EC2

I have a mongodb replicaset with 2 members. 1 primary and 1 secondary. if I issue show dbs, both of them are show like followings:
local 24.06640625GB
test 0.203125GB
db1 9.94921875GB
db1test 0.953125GB
and then I issue use db1 -> db.events.count(), the result return 1003130 documents on both of the members.
that makes sense they reflecting to each other and db1 and db1test on both of dbserver have the same amount of disk usage and the same amount of document in each collection.
then I decide to add a new member(a new dbserver) which has an empty /data/db. I start the new server by using:
sudo mongod --replSet rs0 --fork --logpath /var/log/mongodb/mongodb.log
then in primary server, I issue
rs.add('ipOfNewDBServer:27017')
After a few seconds. my new mongodb server shell change from > -> STARTUP2 -> rs0:SECONDARY which I think start sync.
In the new/recent added mongodb server I issue show dbs, it looks like the following:
local 22.0673828125GB
test 0.203125GB
db1 1.953125GB
db1test 0.453125GB
the disk usage of each database are not as the same as the other two(1 primary and 1 secondary). however, if I issue use db1 -> db.events.count(), the result return 1003130 which are the same as the other two. and I check the other collections in this db1. they all the same.
I wonder why database disk usage are different and the collection in each of database have the same amount of documents??? and correct me if I did anything wrong to sync the data from those two existing set to the new set. the mongodb offical document says This procedure relies on MongoDB’s regular process for initial sync, I have no ideas, please help. thanks
The new member of the replica set will have the benefit of no fragmentation as he full synchronizes to the replset. The existing replicase, very likely, have fragmentation due to deletes and document updates moving the documents.
In our environment, we periodically take each member of the replset offline, whack its data directory, and allow it to full sync to drive out fragmentation. It works for us, but our dataset may be "small" relative to other deployments. I think there is a way to do this through the console with some db.runCommand but I don't know what it is.

How to do selective Mongo recovery?

Suppose I have a Mongo replica set (a primary and few secondaries) with two databases: db1 and db2. One secondary Mongo crashed and lost its data. Now when this Mongo restarts it will recover and copy both db1 and db2 from the primary.
Since such recovery takes a lot of time I would like this secondary Mongo to copy only db1 (but not both db1 and db2) upon recovery. Can I do it with Mongo 2.4.6 ?
MongoDB does not yet have the capacity for selective replication.
Feel free to open a JIRA: https://jira.mongodb.org/secure/Dashboard.jspa there is probably already one but a Google search by me can't bring it up.
Of course, one option here, to sped things up, is to actually physically copy the data from one location to the other without waiting for MongoDBs replication to take hold.
As #Stennie mentions, this is actually the JIRA for selective replication: https://jira.mongodb.org/browse/SERVER-1559