How to make MongoDB `repairDatabase` and `compact` commands work with replica-set? Downtime is ok - mongodb

We need to free some of our MongoDB space, and we identified 100Gb + worth of documents that can be safely removed from a collection.
So we removed them from our test environment which has this setting:
mongodb version 3.0.1
no sharding
1 node, no replica
wiredtiger engine
When done, we found out that the space on disk was still used and needed to be reclaimed. We found this post and it helped us: after running both
db.runCommand({repairDatabase: 1})
and
db.runCommand({compact: collection-name })
We freed 100Gb +.
We then proceeded in production, forgetting that the setting was different since we had 1 replica node:
mongodb version 3.0.1
no sharding
1 primary node, 1 replica node
wiredtiger engine
After removing the documents, we run
db.runCommand({repairDatabase: 1})
and got the OK message (after a while, 10 min +). We tried running
db.runCommand({compact: collection-name })
and got this error:
will not run compact on an active replica set primary as this is a
slow blocking operation. use force:true to force
So we run
db.runCommand({compact: collection-name, force: true })
and got the OK message (almost instantly), but the disk on space is still used, it wasn't freed.
we searched for solutions for running the repairDatabase and compact commands with replica-set but the advise was focused on avoiding downtime as if that was the only issue. However, we can schedule downtime and our problem is rather that the commands don't work as expected since the space is not actually reclaimed.
What did we do wrong?

For replica set configurations, the best and the safest method to recover space is to perform an initial sync. If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it.
Note that the rolling initial sync is only possible if your deployment contains at least three nodes replica set (for reasons I will describe below).
Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.
Having said that, there are some things that are worth mentioning:
Regarding compact:
The compact command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact on WiredTiger could silently fail.
Regarding repairDatabase:
Please don't run repairDatabase on replica set nodes. This is strongly not recommended, as mentioned in the repairDatabase page. The name repairDatabase is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption, which could lead to corrupt documents.
The repairDatabase command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents, in an attempt to get the database into a state where you can start it and salvage intact document from it.
In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase will dutifully remove the corrupt documents. Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash. To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.
Regarding your setup:
I noticed that in your production environment, you created a replica set with two nodes. This setup is not recommended, since the failure of a single node will render the remaining node to become a secondary, and thus disallowing writes to the set.
Due to the way MongoDB high availability works (see Replica Set Election), it's strongly recommended to deploy three data-bearing nodes at a minimum, or at least add an arbiter node (see Replica Set Members) so that your replica set contains an odd number of members.
Having only two members in a replica set also makes rolling upgrades/initial sync/maintenance much harder or even impossible in some cases.
MongoDB 3.0.1 was released in March 17, 2015, which is more than 2 years ago as of this writing. If you're forced to use MongoDB 3.0 series, please consider moving to 3.0.15. Or better yet, to 3.4.7 (the latest as of Aug 10, 2017), which contains massive improvements over 3.0.1.

Related

In MongoDB, can I run the compact command without shutting down each instance?

In the server structure, primary, secondary, and arbiter are each physically operated.
mongo db version is 4.2.3.
Some of the documents were deleted in the oldest order because too many documents were accumulated in a specific collection.
However, even deleting documents did not release the storage area.
Upon checking, I found that mongodb's mechanism retains reusable bytes even if the document is deleted.
Also, I found out that unnecessary disk space can be freed with the compact command in the WiredTiger engine.
Currently, all clients connected to the db are querying using the arbiter ip and port.
Since the DB is composed only of replication, not sharding, if each individual executes the compact command independently, Even if each instance is locked, it is expected that the arbiter will distribute the query to the currently available instances.
Is this possible?
Or, Should I shutdown each instance, run it standalone, run the compact command, and then reconfigure psa?
You may upgrade your MonogDB to latest version 4.4. Documentation of compact:
Blocking
Changed in version 4.4.
Starting in v4.4, on WiredTiger, compact only blocks the following
metadata operations:
db.collection.drop
db.collection.createIndex and db.collection.createIndexes
db.collection.dropIndex and db.collection.dropIndexes
compact does not block MongoDB CRUD Operations for the database it is
currently operating on.
Before v4.4, compact blocked all operations for the database it was
compacting, including MongoDB CRUD Operations, and was therefore
recommended for use only during scheduled maintenance periods.
Starting in v4.4, the compact command is appropriate for use at any
time.
To anyone looking for the answer with 4.4 please see this bug and the documentation entry as the compact routine still forces the node to recovery state if you are running in replica set (and I assume this is the default use case for most projects)

MongoDb preparing for Sharded Clusters

We are currently setting up our mongodb environment for production. At the moment we only have one dedicated mongodb database server. We will expand this in the near future with a 2nd server and I already indicated to the management that for the ideal situation we should get a 3rd server as well.
Since I already know we're going to use sharding and replication in the near future I want to be prepared for it.
The idea I have now is to start now with the Development Configuration (as mongo's documentation names it).
Whenever our second server comes available I would like to expand this setup to a configuration with 2 configuration servers en 2 shards (replica sets).
And of course when our third server comes available have the fully functional sharded cluster configuration.
While reading mongo's documentation I was getting triggered by the note that de Development setup should not be used in production.
MongoDb Development Configuration
Keeping in mind that we will add more servers soon, would it be a bad idea to already configure the Development Configuration already so we can easily add the 2nd server to the cluster when it comes available?
After setting up the 'development sharded setup' I've found my anwser. Of course i'm happy to share in case anybody runs into the same questions as I do when starting with this.
In my case, it was ok to start with the development setup untill my new servers arrived. It was a temporary situation and when my new servers arived I was able to easily expand my replicasets. There are a number of reasons why this isn't adviced for production:
To state the obvious, there is no replication yet. Since I was running shards on one machine there is a single point of failure. If the machine, or one node goes down, the cluster won't work anymore.
Now this part is interesting. After I added a second server, I did have primary and secondary nodes. Primary nodes were used for writing and secondary for reading. I've eliminated the issue that there was no replication AND my data had a higher availability. However, I noticed with the 2-member replica sets, if one member of the replicaset went down (even is this was a secondary), the primary stepped down to a secondary node as well. This had to do with the voting mechanism that MongoDb uses. See Markus' more detailed answer on this.. Since there are no more primaries in the replicaset, my cluster won't function anymore. Now, if i were to use an arbiter I could eliminate this problem as well.
When you have a 3-member replicataset, automatic failover kicks in. Whenever a node goes down, another primary is assigned automatically and the cluster will continue performing as before.
During my tests I also got to a point where one of my MongoD.exe instances stopped working due to a "Out of memory exception". I was running a cluster with 3 replicasets, meaning every machine had at least 4 mongod.exe processes running (3 for the replicaset shards and one for the configuration server replicaset). Besides having a query which wasn't optimized yet I also noticed that the WiredTiger storage engine by default can use up to 50% of ram minus one gigabyte. Perhaps it wasn't the best choise to have multiple replicaset-shards on one machine but I was able to eliminate the problem by capping the wiredtiger memory usage.
I hope this answer helps anybody who's starting to set up replication and sharding for MongoDb.

Migrating MongoDB instances with no down-time

We are using MongoDB in production environment and now, due to some issues of current servers, I'm going to change the server and start a new MongoDB instance.
We have a replica set and a single mongod instance (two different MongoDB networks for different purposes). Now, first I should migrate the single mongod instance and then the whole replica set to the new server.
What I want to know is, how can I migrate both instances with no down-time? I don't want to shutdown the server or stop write operations.
Thanks in advance.
So first of all you should never run mongodb as a single instance for production. At a minimum you should have 1 primary, 1 secondary and 1 arbiter.
Second, even with a replica set you will always have a bit of write downtime when you switch primaries, as writes are not possible during the election process. From the docs:
IMPORTANT Elections are essential for independent operation of a
replica set; however, elections take time to complete. While an
election is in process, the replica set has no primary and cannot
accept writes. MongoDB avoids elections unless necessary.
Elections are going to occur when for example you bring down the primary to move it to a new server or virtual instance, or upgrade the database version (like going from 2.4 to 2.6).
You can keep downtime to a minimum with an existing replica set by setting the appropriate options to allow queries to run against secondaries. Again from the docs:
Maintaining availability during a failover. Use primaryPreferred if
you want an application to read from the primary under normal
circumstances, but to allow stale reads from secondaries in an
emergency. This provides a “read-only mode” for your application
during a failover.
This takes care of reads at least. Writes are best dealt with by having your application retry failed writes, or queue them up.
Regarding your standalone the documented procedures for converting to a replica set are well tested and can be completed very quickly with minimal downtime:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
You cannot have no downtime (a new mongod will run on new IP so you need to at least connect to it). But you can minimize downtime by making geographically distributed replica set.
Please Read
http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/
Use the given process but please note:
Do not set priority 0 of instances at New Location so that they become primary when old ones at Old Location step down.
You still need to restart mongod in replica set mode at Old Location.
You need 3 instances including an arbiter at New Location, if you want it to be
replica set.
When complete data is in sync with instances at New Location, step down instances at Old Location (one by one). Now everything will go to New Location but the problem is that it is directed through a distant mongod.
So stop mongod at Old Location and start a new one at new Location. Connect your applications to New Location Mongod.
Note: I have not done the same so far. I had planned it once but then I got the problem and it was not of hosting provider. Practically you may get some issues.
Replica Set is the feature provided by the Mongodb database to achieve high availability and automatic failover.
It is kinda traditional master-slave configuration but have capability of automatic failover.
It is basically group/cluster of the mongod instances which communicates, replicates to each other to provide high availability and to do automatic failover
Basically, in replica sets there are minimum 2 and maximum of 12 mongod instances can exist
In replica set following types of server exist. out of all, one server is always primary.
http://blog.ajduke.in/2013/05/31/setup-mongodb-replica-set-in-4-steps/
John answer is right, btw in your case you have no way to avoid downtime, you can just try to make it shorter as possible.
You can prepare the new replica set and save its configuration.
Same for the single mongod instance, prepare a js file with specific configuration (ie: stuff going on the admin database).
disable client connections on production servers.
copy the datafiles from the old servers to the new ones (http://docs.mongodb.org/manual/core/backups/#backup-with-file-copies)
apply your previous saved replica set config and configuration.
done
you can use diffent ways as add an hidden secondary member on the replica set if you have a lot of data, so you can wait it's is up-to-date before stopping the production server. Basically for the replica set you have many ways to handle a migration, with the single instance instead you don't have such features.

MongoDB Replica-Set Disk Cleanup

I am trying to shrink the size of my MongoDB replica-set(the collections are the same size but disk space keeps growing). According to the MongoDB website, I should just run mongod --repair on the master node to compact all collections. The problem would be downtime for the website. So, I have two options(that I know about):
Take secondary node off of replica-set and run mongod --repair on it and restart back on replica-set. I tried this and couldn't get past permission errors on 'local' collection.
Shut down secondary node and delete all files in the data directory. Restart mongo and let it recover from master. This actually worked for me but my only concern is, what if your journal collection is full and since it's a capped collection, will you only receive the data that is in the journal or will you actually copy over all of master's data?
Has anyone else run into this scenario? I'm surprised by the lack of information when trying to search for this.
Take secondary node off of replica-set and run mongod --repair on it and restart back on replica-set.
This is a common practice which is usually referred to as a "rolling repair". You take each secondary out of the replica set and repair it, and eventually step down the primary for repair as a last step. As long as you always have a majority of your replica set nodes available this approach will minimize potential downtime.
If you are frequently deleting data you should consider using the new PowerOf2Sizes collection option in MongoDB 2.2. This changes the allocation method to allocate document space in powers of two (eg. a 500 byte document would be allocated 512 bytes), which allows for more effective reuse of the space from deleted documents (at the slight expense of a few more bytes per document).
I tried this and couldn't get past permission errors on 'local' collection.
Permission errors on the 'local' collection sound like file system permissions (i.e. based on the user you were running your mongod as). You should run the repair process with the same user.
Shut down secondary node and delete all files in the data directory. Restart mongo and let it recover from master. This actually worked for me but my only concern is, what if your journal collection is full and since it's a capped collection, will you only receive the data that is in the journal or will you actually copy over all of master's data?
It sounds like you are conflating the Journal which is used for durability and crash recovery with the Oplog used for replication.
If you resync a node from the primary, all data will be copied over. During this initial period the
node will be in RECOVERING state and is not considered a "healthy" node (i.e. available for queries).
Once the node is caught up it will change to a normal SECONDARY state at which point the oplog will be used for ongoing sync.
Some further reading:
Replication fundamentals
Replica set status reference

Replica set never finishes cloning primary node

We're working with an average sized (50GB) data set in MongoDB and are attempting to add a third node to our replica set (making it primary-secondary-secondary). Unfortunately, when we bring the nodes up (with the appropriate command line arguments associating them with our replica set), the nodes never exit the RECOVERING stage.
Looking at the logs, it seems as though the nodes ditch all of their data as soon as the recovery completes and start syncing again.
We're using version 2.0.3 on all of the nodes and have tried adding the third node from both a "clean" (empty db) state as well as a bootstrapped state (using mongodump to take a snapshot of the primary database and mongorestore'ing that snapshot into the new node), each failing.
We've observed this recurring phenomenon over the past 24 hours and any input/guidance would be appreciated!
It's hard to be certain without looking at the logs, but it sounds like you're hitting a known issue in MongoDB 2.0.3. Check out http://jira.mongodb.org/browse/SERVER-5177 . The problem is fixed in 2.0.4, which has an available release candidate.
I don't know if it helps, but when I got that problem, I erased the replica DB and initiated it. It started from scratch and replicated OK. worth a try, I guess.