Mongodb sharding is NOT recovered after power off accident

Mongodb sharding is NOT recovered after power off accident - mongodb

I'm running 4 vms (centos) on a single machine (Windows 2008 R2). The 4 vms are setup as below:
1 mongos
1 mongo configure server
2 mongod as sharding servers
OK, everything was fine before a power off accident. When the power came back, I did manually reboot all the VMs, and found the sharding setting is gone. I mean, the mongos can talk to the configure server, but somehow the sharding data is lost and it show the database is not sharded.
I setup the sharding based on documents from mongodb websites (e.g. running some command in mongo shell to enable sharding for the db and each collection). Do I need to do all the mongo shell commands again after rebooting? Or is it supposed to recover automatically once the sharding is enabled?
Thanks.

Once you've established a sharded cluster, it is certainly supposed to stay configured, even if servers fail, even if they all fail at the same time. Restarting the servers should bring everything up just the way it was before the outage. Based on your description, it is difficult to reason what might have gone wrong. A dump of the config database, and the log files of all the affected servers, would be necessary to analyze what happened. This should perhaps be filed as a ticket with MongoDB support.
(It is, by the way, recommended to run three config servers rather than a single one, for availability reasons. But even so, even a single server should recover just fine after having failed. The three-server recommendation is only to make sure that there's always a live config server even if one of them should fail.)

Related

MongoDb preparing for Sharded Clusters

We are currently setting up our mongodb environment for production. At the moment we only have one dedicated mongodb database server. We will expand this in the near future with a 2nd server and I already indicated to the management that for the ideal situation we should get a 3rd server as well.
Since I already know we're going to use sharding and replication in the near future I want to be prepared for it.
The idea I have now is to start now with the Development Configuration (as mongo's documentation names it).
Whenever our second server comes available I would like to expand this setup to a configuration with 2 configuration servers en 2 shards (replica sets).
And of course when our third server comes available have the fully functional sharded cluster configuration.
While reading mongo's documentation I was getting triggered by the note that de Development setup should not be used in production.
MongoDb Development Configuration
Keeping in mind that we will add more servers soon, would it be a bad idea to already configure the Development Configuration already so we can easily add the 2nd server to the cluster when it comes available?

After setting up the 'development sharded setup' I've found my anwser. Of course i'm happy to share in case anybody runs into the same questions as I do when starting with this.
In my case, it was ok to start with the development setup untill my new servers arrived. It was a temporary situation and when my new servers arived I was able to easily expand my replicasets. There are a number of reasons why this isn't adviced for production:
To state the obvious, there is no replication yet. Since I was running shards on one machine there is a single point of failure. If the machine, or one node goes down, the cluster won't work anymore.
Now this part is interesting. After I added a second server, I did have primary and secondary nodes. Primary nodes were used for writing and secondary for reading. I've eliminated the issue that there was no replication AND my data had a higher availability. However, I noticed with the 2-member replica sets, if one member of the replicaset went down (even is this was a secondary), the primary stepped down to a secondary node as well. This had to do with the voting mechanism that MongoDb uses. See Markus' more detailed answer on this.. Since there are no more primaries in the replicaset, my cluster won't function anymore. Now, if i were to use an arbiter I could eliminate this problem as well.
When you have a 3-member replicataset, automatic failover kicks in. Whenever a node goes down, another primary is assigned automatically and the cluster will continue performing as before.
During my tests I also got to a point where one of my MongoD.exe instances stopped working due to a "Out of memory exception". I was running a cluster with 3 replicasets, meaning every machine had at least 4 mongod.exe processes running (3 for the replicaset shards and one for the configuration server replicaset). Besides having a query which wasn't optimized yet I also noticed that the WiredTiger storage engine by default can use up to 50% of ram minus one gigabyte. Perhaps it wasn't the best choise to have multiple replicaset-shards on one machine but I was able to eliminate the problem by capping the wiredtiger memory usage.
I hope this answer helps anybody who's starting to set up replication and sharding for MongoDb.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.

I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.

I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

MongoDB single server production setup

I am developing a server to a customer who has only one machine for his production deployment.
It's a CentOS 64bit with 8Gb of memory.
I am using Mongo and the question is, do I still need to deploy a replica set even though it's a single machine?
Will I get the advantages of a replica set or since it's a single machine it really does not matter and journaling is enough?

You definitely have to enable journaling (It will ensure consistent state even in cases of HW failure scenarios, you will not have to run costy repair command after a crash). You should enable RAID under the data directrory (Anyway this is in general recommended), while here will be crucial not to lose data due to a disk failure (You do not have copy on an other box or so). There is no option for HA within one box it is quite straightforward, however it is not harmful, and in some cases useful to configure 1 node (1 mongod) replicaset (Than you will have oplog). This will help for example when you likely to have MMS backup, or just for enable point in time backup feature of mongodump. Later if you will likely to scale out for HA this way you will only have to add the new nodes to your initially established replicaset.
Make no sense to run several replicas inside one box, while they will race on HW resources and will bring nothing as an advantage.

64-bit mongodb multiple-shards Issue

I am using 64-bit MongoDB, and i am undergoing test on multiple-shards. If i keep multiple shards in a single machine. Its working fine but if i keep shards in different machine, its failed in sharding to second shard. I have restricted the first-shard size to 10MB, once its reaches the limited size in first shard it should start sharding to second-shard but not happening so.Instead failed to store in second-shard updating to first shard. The following are my shard details. In my environment initially i have two shards. The first shard is on my first-machine running along with my application. The Second-shard is on my second machine.
Configuration as follows:-
*)On both of my shards, shard-server,configserver,mongos and i have connected mongo through mongos as follows ./mongo hostname:27017/admin and i have added both the shards in first & second shard and enabled sharding for database and collection level by using shard-key.
Please, let me know if i gone wrong anywhere in the configuration.
Advance Thanks,

Your post could use some editing, this is very difficult to read.
It looks like you have 2 machines. On each machine you have:
mongod process serving as one shard
mongod process serving as a config
mongos process
a copy of your application connecting to localhost:27017/admin
Please, let me know if i gone wrong anywhere in the configuration.
There are several possible problems here. Please check the following:
You can only have 1 or 3 config processes. It looks like you have 2, this will not work.
When you connect to localhost:27017/admin are you connecting to mongos or mongod? Either one could be running on those ports. Can you specify the ports for each process to help clarify? You must connect to mongos or the sharding will not happen.
Please look at the logs, they generally have output indicating what the server is doing. If there is no indication of "splits" or "chunks" happening, then your database may be configured incorrectly.
Your best bet is to start from top and test each piece one at a time.

In a sharded environment, what should happen if my application connect to mongod instead of mongos?

The data should go to one shard or it will be splitted between the shards?

Don't do that. :-)
There isn't explicit protection against that; in the future that would be a good improvement. Typically the mongod's in a sharded cluster are running on a different port number than the default so it is difficult to accidentally do that.
A sysadmin may wish to explicitly connect to such a server to inspect it -- that is fine as long as the operations are read only. You could also do administrative actions there such as a reindex and such.