I currently have a mongoDB setup with a mongos server, a config server, and 2 shards of 3 mongod (master-slave) servers each. I would like to ensure that when I shut them down they are shut down cleanly as to not lose any data that is queued or while the server is determining the shard to write to, etc..
What is the current best practice for shutting down a cluster of MongoDB servers?
Which order are things best to be shut down in, issue fsync, write locks, etc..
I'd like to write a script to automate this to facilitate backups, new code pushes, and anything else that otherwise requires the database to be in a consistent state.
These best practices are still really being cleared up.
With your setup here's how I would do the server maintenance.
Backups
Find a non-primary in each replica set. Perform an fsync & lock. Copy, tar, backup. Unlock the DB.
You should be able to do this successfully on a replica set. If you're really worried, you can do fsync & lock and then a shutdown.
Compressions
You probably want to compress data at some point. The easiest way to do this is again to do an fsync & lock and then do a db.repairDatabase(). The repair command will basically do a "defrag / compression" for you. As above, this can also be down with a shutdown.
Code pushes
Ideally, there is very little that needs to be consistent with regards to a code push. At the worst, you'll need to manage index creation / deletion. But this really needs to be managed separately as you don't want devs just randomly adding indexes on a production DB.
Monitoring
This is a way more complex topic, but you'll probably want to watch for things like "who's master", "what the write throughput on each node", "how much RAM am I using", "how much data is shifting between nodes". There are limited tools for doing this right now, so expect to roll your own.
Related
I have a problem in production on my cluster.
Our monitoring fail on monitoring disk space and this over.
And i needed to remove some part of data directly on a master shard node.
I say on mongod with command:
db.collection.remove({query})
I know this is dangerous but is my only option at moment because i can't open mongo shell on mongos.
Now cluster works well but i need to know the real impact of my action.
And how to solve.
The real impact is that you lose the data you deleted. There should be no other operational impact on the database itself. It should just return nothing when the affected documents are requested.
I'm sure you understand that this deletion directly into a shard (bypassing mongos) is not a recommended action by any means. In general, bypassing mongos could result in an undefined behavior of the cluster, and the resulting issue could stay dormant for a long time. In the worst case, this would lead to corrupt backup.
Having said that, deletion using the mongo shell (or a driver) is much preferred compared to going into the dbPath directory and deleting files. That action could lead into unrecoverable database.
The more immediate impact may be felt by the application, e.g. if your application expects a result and it receives none. I would encourage you to test all workflows of your application and confirm that everything is working as expected.
How do I rescue a sharded MongoDB cluster when one shard is permanently damaged?
I have a MongoDB cluster with 48 shards. Each shard is a primary with one replicaset. Due to Bad Planning (tm), one of the boxes ran out of filespace and died. The other one, already close, then ran out of space too. Due to bad circumstances (probably a compact() or repairdb() going on at the time, the entire shard was corrupted.
I stopped daemons, tried to repair, but it would not succeed.
So, the question is, how do I accept the loss of one shard but keep the other good shards? With 48 shards, the los of one is only 2% of my data. I'm okay with losing that data, but I have to get to a normal healthy state.
What do I do?
ANSWER OBSOLETE, REDOING ANSWER:
Stop all daemons on all boxes.
change config files for primaries to make them come up as standalone instances.
use mongoexport or mongodump to dump that shard's data into a file. Ensure that the file contains the collections you want. Try to get it so it doesn't include the _id field.
when you have backups completed and moved off the boxes to appropriately safe locations, clean up. delete all data files, etc., and essentially re-create your cluster.
Re-load your data from your data backups.
Note that when you do the re-creation of the cluster, you should probably prepopulate it with a certain / large number of chunks so the splitchunk processes doesn't take forever.
If you end up with unbalanced shards (lots of chunks in one, not another), pause, turn off balancer's throttle so it goes Real Fast, and once it's balanced again, restart reloading.
I have decided to start developing a little web application in my spare time so I can learn about MongoDB. I was planning to get an Amazon AWS micro instance and start the development and the alpha stage there. However, I stumbled across a question here on Stack Overflow that concerned me:
But for durability, you need to use at least 2 mongodb server
instances as master/slave. Otherwise you can lose the last minute of
your data.
Is that true? Can't I just have my box with everything installed on it (Apache, PHP, MongoDB) and rely on the data being correctly stored? At least, there must be a config option in MongoDB to make it behave reliably even if installed on a single box - isn't there?
The information you have on master/slave setups is outdated. Running single-server MongoDB with journaling is a durable data store, so for use cases where you don't need replica sets or if you're in development stage, then journaling will work well.
However if you're in production, we recommend using replica sets. For the bare minimum set up, you would ideally run three (or more) instances of mongod, a 'primary' which receives reads and writes, a 'secondary' to which the writes from the primary are replicated, and an arbiter, a single instance of mongod that allows a vote to take place should the primary become unavailable. This 'automatic failover' means that, should your primary be unable to receive writes from your application at a given time, the secondary will become the primary and take over receiving data from your app.
You can read more about journaling here and replication here, and you should definitely familiarize yourself with the documentation in general in order to get a better sense of what MongoDB is all about.
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
Replication in MongoDB
A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.
The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. To support replication, the primary logs all changes to its data sets in its oplog. See primary for more information.
I read about the different MongoDB setups for doing backup without downtime. Which strategy is best or can they even be compared?
Enable journaling and simply copy the /data/db directory - it is unclear to me if this is enough – on the MongoDB home page it states that you have to "snapshot it" and it works on SAN and LVM as examples.
Questions:
What does snapshot mean in this context will a copy command count as a snapshot? Is it save to copy a journaling MongoDB (2.0+) data directory on a Windows server with NTFS? How do you ensure that it is safe to do on your own filesystem and setup?
Establish a replica set with 2 servers and an arbiter. Then use rs.status() and fsyncLock/unlock to ensure data is read only on the secondary server while doing backup.
> db.fsyncLock
function () {
return db.adminCommand({fsync:1, lock:true});
}
> db.fsyncUnlock
function () {
return db.getSiblingDB("admin").$cmd.sys.unlock.findOne();
}
Questions:
If you use locks in a replica set it seems that writes and reads can be locked for the whole replica set and this bug has not been fixed?
What if the secondary is voted in as primary while the backup is in progress? Will the backup process stop or will the replica set stop responding to write requests until it is unlocked?
Considerations:
For now I would like the simple solution and simply copy the data/db directory with journal files and wait with the replica set. MongoDB runs on a 64 bit Windows server (RackSpace Cloud).
The best bet is to do fsync + lock on a secondary, then snapshot the volume at the disk or volume level (e.g. using lvm2, hyper-v, btrfs), unlock the database, then copy the snapshotted data files. This minimizes downtime of the secondary and is easy to restore.
"Snapshotting" in this context refers to the snapshot features offered by some volume managers, file systems and hypervisors. Essentially, this is a 'copy-on-write' feature for block devices: instead of overwriting data when the OS demands it, it will write the new data elsewhere and keep both the old version and the new version readable. Snapshotting usually takes almost no time, but on some systems, it's a bad idea to keep many snapshots of the same files, because it may dramatically slow future writes.
Why I believe this is the best strategy for full backups:
Using mongodump won't store the index data The indexes will be restored, but rebuilding indexes for recovery can take hours - the last thing you need when everybody is yelling at you is an operation that takes hours and can't be accelerated.
Fsync + lock will block writers and might block readers hence, it's best to do that on a (passive) secondary, not on the primary.
Halting a secondary will fill the oplog which is why you should keep the lock time as short as possible. Instead of copying all data files (which could take hours) during the lock, merely performing a snapshot should take only a couple of seconds. Hence, oplog limits are not a concern.
Everything is 'back to normal' while the actual copy is running, which gives you peace of mind. The only difference will be higher load on a secondary during the backup, which shouldn't be a major concern.
Addressing your questions:
regarding locks in replica sets: Keep the lock time short, and use a passive secondary (which can't be elected master) so the writer queue can't stall.
"What if the secondary is voted in as primary while the backup is in progress" can't happen if your backup system is passive
For now I would like the simple solution and simply copy the data/db dir with journal files and wait with the replica set. The MongoDB runs on a 64 bit Windows server (RackSpace Cloud).
You can do that. Volume snapshotting is probably still the best way to go, giving you only seconds of downtime. If your data is small, a simple mongodump might be even easier, but make sure recovery times are acceptable (depends on your indexes).
I am working on a project which has some important data in it. This means we cannot to lose any of it if the light or server goes down. We are using MongoDB for the database. I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted. I know it is the philosophy behind Mongo that we do not need transactions but how can I make sure that my data is really safely stored after insert rather than sent to some "black hole".
Should I make a search?
Should I use some specific mongoDB commands?
Should I use sharding even if one server is enough for satisfying
the speed and by the way it doesn't guarantee anything if the light
goes down?
What is the best solution?
Your best bet is to use Write Concerns - these allow you to tell MongoDB how important a piece of data is. The quickest Write Concern is also the least safe - the data is not flushed to disk until the next scheduled flush. The safest will confirm that the data has been written to disk on a number of machines before returning.
The write concern you are looking for is FSYNC_SAFE (at least that is what it is called from the point of view of the Java driver) or REPLICAS_SAFE which confirms that your data has been replicated.
Bear in mind that MongoDB does not have transactions in the traditional sense - your rollback will have to be rolled by hand as you can't tell the Mongo database to do this for you.
The other thing you need to do is either use the relatively new --journal option (which uses a Write Ahead Log), or use replica sets to share your data across many machines in order to maximise data integrity in the event of a crash/power loss.
Sharding is not so much a protection against hardware failure as a method for sharing the load when dealing with particularly large datasets - sharding shouldn't be confused with replica sets which is a way of writing data to more than one disk on more than one machine.
Therefore, if your data is valuable enough, you should definitely be using replica sets, perhaps even siting slaves in other data centres/availability zones/racks/etc in order to provide the resilience you require.
There is/will be (can't remember offhand whether this has been implemented yet) a way to specify the priority of individual nodes in a replica set such that if the master goes down the new master that is elected is one in the same data centre if such a machine is available (ie to stop a slave on the other side of the country from becoming master unless it really is the only other option).
I received a really nice answer from a person called GVP on google groups. I will quote it(basically it adds up to Rich's answer):
I'd like to be sure that my data is in the database after the
insert and rollback the whole batch if one element was not inserted.
This is a complex topic and there are several trade-offs you have to
consider here.
Should I use sharding?
Sharding is for scaling writes. For data safety, you want to look a
replica sets.
Should I use some specific mongoDB commands?
First thing to consider is "safe" mode or "getLastError()" as
indicated by Andreas. If you issue a "safe" write, you know that the
database has received the insert and applied the write. However,
MongoDB only flushes to disk every 60 seconds, so the server can fail
without the data on disk.
Second thing to consider is "journaling"
(v1.8+). With journaling turned on, data is flushed to the journal
every 100ms. So you have a smaller window of time before failure. The
drivers have an "fsync" option (check that name) that goes one step
further than "safe", it waits for acknowledgement that the data has
be flushed to the disk (i.e. the journal file). However, this only
covers one server. What happens if the hard drive on the server just
dies? Well you need a second copy.
Third thing to consider is
replication. The drivers support a "W" parameter that says "replicate
this data to N nodes" before returning. If the write does not reach
"N" nodes before a certain timeout, then the write fails (exception
is thrown). However, you have to configure "W" correctly based on the
number of nodes in your replica set. Again, because a hard drive
could fail, even with journaling, you'll want to look at replication.
Then there's replication across data centers which is too long to get
into here. The last thing to consider is your requirement to "roll
back". From my understanding, MongoDB does not have this "roll back"
capacity. If you're doing a batch insert the best you'll get is an
indication of which elements failed.
Here's a link to the PHP driver on this one: http://it.php.net/manual/en/mongocollection.batchinsert.php You'll have to check the details on replication and the W parameter. I believe the same limitations apply here.