I have a replicaset with smallfiles enabled, now I'm suffering from the huge time one instance take to start/restart, db files count is something like 2500 files and it take almost an hour to load it and start up, any suggestion how I can speed this process up ?
Performance should improve if your run your Mongo instance with smallfiles disabled. As this is a replica set, you can just shut down your instance, delete all your data files and journals, and then restart your service. After restarting, the data will be synced again with the primary instance. This initial sync may take some time, however, any subsequent sync should be a lot faster.
Related
We are using MongoDB in production environment and now, due to some issues of current servers, I'm going to change the server and start a new MongoDB instance.
We have a replica set and a single mongod instance (two different MongoDB networks for different purposes). Now, first I should migrate the single mongod instance and then the whole replica set to the new server.
What I want to know is, how can I migrate both instances with no down-time? I don't want to shutdown the server or stop write operations.
Thanks in advance.
So first of all you should never run mongodb as a single instance for production. At a minimum you should have 1 primary, 1 secondary and 1 arbiter.
Second, even with a replica set you will always have a bit of write downtime when you switch primaries, as writes are not possible during the election process. From the docs:
IMPORTANT Elections are essential for independent operation of a
replica set; however, elections take time to complete. While an
election is in process, the replica set has no primary and cannot
accept writes. MongoDB avoids elections unless necessary.
Elections are going to occur when for example you bring down the primary to move it to a new server or virtual instance, or upgrade the database version (like going from 2.4 to 2.6).
You can keep downtime to a minimum with an existing replica set by setting the appropriate options to allow queries to run against secondaries. Again from the docs:
Maintaining availability during a failover. Use primaryPreferred if
you want an application to read from the primary under normal
circumstances, but to allow stale reads from secondaries in an
emergency. This provides a “read-only mode” for your application
during a failover.
This takes care of reads at least. Writes are best dealt with by having your application retry failed writes, or queue them up.
Regarding your standalone the documented procedures for converting to a replica set are well tested and can be completed very quickly with minimal downtime:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
You cannot have no downtime (a new mongod will run on new IP so you need to at least connect to it). But you can minimize downtime by making geographically distributed replica set.
Please Read
http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/
Use the given process but please note:
Do not set priority 0 of instances at New Location so that they become primary when old ones at Old Location step down.
You still need to restart mongod in replica set mode at Old Location.
You need 3 instances including an arbiter at New Location, if you want it to be
replica set.
When complete data is in sync with instances at New Location, step down instances at Old Location (one by one). Now everything will go to New Location but the problem is that it is directed through a distant mongod.
So stop mongod at Old Location and start a new one at new Location. Connect your applications to New Location Mongod.
Note: I have not done the same so far. I had planned it once but then I got the problem and it was not of hosting provider. Practically you may get some issues.
Replica Set is the feature provided by the Mongodb database to achieve high availability and automatic failover.
It is kinda traditional master-slave configuration but have capability of automatic failover.
It is basically group/cluster of the mongod instances which communicates, replicates to each other to provide high availability and to do automatic failover
Basically, in replica sets there are minimum 2 and maximum of 12 mongod instances can exist
In replica set following types of server exist. out of all, one server is always primary.
http://blog.ajduke.in/2013/05/31/setup-mongodb-replica-set-in-4-steps/
John answer is right, btw in your case you have no way to avoid downtime, you can just try to make it shorter as possible.
You can prepare the new replica set and save its configuration.
Same for the single mongod instance, prepare a js file with specific configuration (ie: stuff going on the admin database).
disable client connections on production servers.
copy the datafiles from the old servers to the new ones (http://docs.mongodb.org/manual/core/backups/#backup-with-file-copies)
apply your previous saved replica set config and configuration.
done
you can use diffent ways as add an hidden secondary member on the replica set if you have a lot of data, so you can wait it's is up-to-date before stopping the production server. Basically for the replica set you have many ways to handle a migration, with the single instance instead you don't have such features.
Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?
Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.
I'm setting up a replica set on my localhost just to get practice working with the necessary commands. I'm doing it both at home and at work. At home, the rs.initiate() command takes about three seconds to run, and it takes about another two or three minutes for rs.status() to give me all states that are either PRIMARY or SECONDARY. This is about what I expected.
But when I do it at work, rs.initiate() takes almost 7 minutes just to give me back my prompt in the mongo shell, then another 10 minutes before the states are all PRIMARY or SECONDARY. In the meantime, the one from which I initiated the connection is in SECONDARY or PRIMARY and the other two are RECOVERING. They just sit there, RECOVERING, for ten minutes.
While the rs.initiate() command is running, the mongod to which I've connected (port 10001) keeps spitting out stuff about "allocating datafile", then "done allocating datafile", then "allocating new datafile", and so on until it's got about a dozen of those files. The other two just sit there with "replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)" the whole time.
While I'm checking rs.status(), the two secondaries keep accepting and dropping connections, and allocating datafiles much like the primary does during the first step.
So is it supposed to take this long? And, if so, why does my machine at home do it in seconds? The only difference is that the one I'm running at home is 32-bit instead of 64-bit.
How much available disk space is there on your work environment's disk? MongoDB by default allocates for its oplog (the "local" database) 5% of available disk space. My guess based on the information you provided is that your available disk space is much larger on your work machine than your home machine.
If you wait until your the rs.initiate() is complete (i.e. you see PRIMARY) before you try starting the other nodes, I think you'll see results that you expect.
if you're facing this issue , try restarting mongodb. i faced this problem with fresh db. I resolved it by restarting mongodb.
I am trying to shrink the size of my MongoDB replica-set(the collections are the same size but disk space keeps growing). According to the MongoDB website, I should just run mongod --repair on the master node to compact all collections. The problem would be downtime for the website. So, I have two options(that I know about):
Take secondary node off of replica-set and run mongod --repair on it and restart back on replica-set. I tried this and couldn't get past permission errors on 'local' collection.
Shut down secondary node and delete all files in the data directory. Restart mongo and let it recover from master. This actually worked for me but my only concern is, what if your journal collection is full and since it's a capped collection, will you only receive the data that is in the journal or will you actually copy over all of master's data?
Has anyone else run into this scenario? I'm surprised by the lack of information when trying to search for this.
Take secondary node off of replica-set and run mongod --repair on it and restart back on replica-set.
This is a common practice which is usually referred to as a "rolling repair". You take each secondary out of the replica set and repair it, and eventually step down the primary for repair as a last step. As long as you always have a majority of your replica set nodes available this approach will minimize potential downtime.
If you are frequently deleting data you should consider using the new PowerOf2Sizes collection option in MongoDB 2.2. This changes the allocation method to allocate document space in powers of two (eg. a 500 byte document would be allocated 512 bytes), which allows for more effective reuse of the space from deleted documents (at the slight expense of a few more bytes per document).
I tried this and couldn't get past permission errors on 'local' collection.
Permission errors on the 'local' collection sound like file system permissions (i.e. based on the user you were running your mongod as). You should run the repair process with the same user.
Shut down secondary node and delete all files in the data directory. Restart mongo and let it recover from master. This actually worked for me but my only concern is, what if your journal collection is full and since it's a capped collection, will you only receive the data that is in the journal or will you actually copy over all of master's data?
It sounds like you are conflating the Journal which is used for durability and crash recovery with the Oplog used for replication.
If you resync a node from the primary, all data will be copied over. During this initial period the
node will be in RECOVERING state and is not considered a "healthy" node (i.e. available for queries).
Once the node is caught up it will change to a normal SECONDARY state at which point the oplog will be used for ongoing sync.
Some further reading:
Replication fundamentals
Replica set status reference
Is it possible to set up mongoDB replica set with following scenario (if it is,how):
2 servers always online running mongodb, one of them holds the main node, the other one a rescue copy;
n computers each of them running mongodb, occasionally connected to internet, holding nodes which need synchronizing with main node, when they go online.
Backup only. In order to do this, you'll have to specify the priority of this node to 0. If your node is never going to be used as master nor queried, you can also set buildIndexes to false.
More informations here.
Intermitent slave. Due to limitations (mainly on the oplog queue), you can't have a slave halted for a very long time if you have many writes on your MongoDB, see here. However, you can use the mongodump and mongorestore tools directly over network or by script + sync the backup file. More informations here. Note that a restore will bring a db or collection in a server and recreate the indexes completely (if you restore the system.indexes collection too) which can take some time.