MongoDB replSet takes forever to initiate - mongodb

I'm setting up a replica set on my localhost just to get practice working with the necessary commands. I'm doing it both at home and at work. At home, the rs.initiate() command takes about three seconds to run, and it takes about another two or three minutes for rs.status() to give me all states that are either PRIMARY or SECONDARY. This is about what I expected.
But when I do it at work, rs.initiate() takes almost 7 minutes just to give me back my prompt in the mongo shell, then another 10 minutes before the states are all PRIMARY or SECONDARY. In the meantime, the one from which I initiated the connection is in SECONDARY or PRIMARY and the other two are RECOVERING. They just sit there, RECOVERING, for ten minutes.
While the rs.initiate() command is running, the mongod to which I've connected (port 10001) keeps spitting out stuff about "allocating datafile", then "done allocating datafile", then "allocating new datafile", and so on until it's got about a dozen of those files. The other two just sit there with "replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)" the whole time.
While I'm checking rs.status(), the two secondaries keep accepting and dropping connections, and allocating datafiles much like the primary does during the first step.
So is it supposed to take this long? And, if so, why does my machine at home do it in seconds? The only difference is that the one I'm running at home is 32-bit instead of 64-bit.

How much available disk space is there on your work environment's disk? MongoDB by default allocates for its oplog (the "local" database) 5% of available disk space. My guess based on the information you provided is that your available disk space is much larger on your work machine than your home machine.
If you wait until your the rs.initiate() is complete (i.e. you see PRIMARY) before you try starting the other nodes, I think you'll see results that you expect.

if you're facing this issue , try restarting mongodb. i faced this problem with fresh db. I resolved it by restarting mongodb.

Related

pg_create_logical_replication_slot hanging indefinitely due to old walsender process

I am testing logical replication between 2 PostgreSQL 11 databases for use on our production (I was able to set it thanks to this answer - PostgreSQL logical replication - create subscription hangs) and it worked well.
Now I am testing scripts and procedure which would set it automatically on production databases but I am facing strange problem with logical replication slots.
I had to restart logical replica due to some changes in setting requiring restart - which of course could happen on replicas also in the future. But logical replication slot on master did not disconnect and it is still active for certain PID.
I dropped subscription on master (I am still only testing) and tried to repeat the whole process with new logical replication slot but I am facing strange situation.
I cannot create new logical replication slot with the new name. Process running on the old logical replication slot is still active and showing wait_event_type=Lock and wait_event=transaction.
When I try to use pg_create_logical_replication_slot to create new logical replication slot I get similar situation. New slot is created - I see it in pg_catalog but it is marked as active for the PID of the session which issued this command and command hangs indefinitely. When I check processes I can see this command active with same waiting values Lock/transaction.
I tried to activate parameter "lock_timeout" in postgresql.conf and reload configuration but it did not help.
Killing that old hanging process will most likely bring down the whole postgres because it is "walsender" process. It is visible in processes list still with IP of replica with status "idle wating".
I tried to find some parameter(s) which could help me to force postgres to stop this walsender. But settings wal_keep_segments or wal_sender_timeout did not change anything. I even tried to stop replica for longer time - no effect.
Is there some way to do something with this situation without restarting the whole postgres? Like forcing timeout for walsender or lock for transaction etc...
Because if something like this happens on production I would not be able to use restart or any other "brute force". Thanks...
UPDATE:
"Walsender" process "died out" after some time but log does not show anything about it so I do not know when exactly it happened. I can only guess it depends on tcp_keepalives_* parameters. Default on Debian 9 is 2 hours to keep idle process. So I tried to set these parameters in postgresql.conf and will see in following tests.
Strangely enough today everything works without any problems and no matter how I try to simulate yesterday's problems I cannot. Maybe there were some network communication problems in the cloud datacenter involved - we experienced some occasional timeouts in connections into other databases too.
So I really do not know the answer except for "wait until walsender process on master dies" - which can most likely be influenced by tcp_keepalives_* settings. Therefore I recommend to set them to some reasonable values in postgresql.conf because defaults on OS are usually too big.
Actually we use it on our big analytical databases (set both on PostgreSQL and OS) because of similar problems. Golang and nodejs programs calculating statistics from time to time failed to recognize that database session ended or died out in some cases and were hanging until OS ended the connection after 2 hours (default on Debian). All of it seemed to be always connected with network communication problems. With proper tcp_keepalives_* setting reaction is much quicker in case of problems.
After old walsender process dies on master you can repeat all steps and it should work. So looks like I just had bad luck yesterday...

MongoDB replicaset slow startup

I have a replicaset with smallfiles enabled, now I'm suffering from the huge time one instance take to start/restart, db files count is something like 2500 files and it take almost an hour to load it and start up, any suggestion how I can speed this process up ?
Performance should improve if your run your Mongo instance with smallfiles disabled. As this is a replica set, you can just shut down your instance, delete all your data files and journals, and then restart your service. After restarting, the data will be synced again with the primary instance. This initial sync may take some time, however, any subsequent sync should be a lot faster.

Mongos takes hours to 'show collections'

I have a mongo 2.4.8 cluster. My software dynamically partitions data, and I now have about 30,000 sharded collections. The cluster currently contains only one shard (which is a replica set); it is a cluster to allow easy future expansion.
When I start a new mongos process and run show collections, it takes it several hours to complete. During this time the mongos is unresponsive to all clients (but the cluster is fine). If I never run show collectoins, all other operations through the mongos work normally.
Eventually show collections completes and after that the mongos works fine, and running show collections again on the same mongos returns right away. I only found out there was a problem when I needed to restart a mongos for the first time in many months, during which the collection count rose greatly.
Logically, it would seem that data transfer (about collection chunks) from the config servers to the new mongos is the bottleneck. But neither side shows high CPU or network activity while this is going on.
Is this known behavior? How can I further investigate the problem?
I traced the problem to a faulty config server. After replacing it, everything is working fine again.
Details: the bad server didn't respond to queries, after which they were re-sent to other servers. This created an effective latency for each request to the config servers, which was most pronounced in the 'show collections' operation that does at least one roundtrip per collection between mongos and the config servers, and does them all serially.

Migrating MongoDB instances with no down-time

We are using MongoDB in production environment and now, due to some issues of current servers, I'm going to change the server and start a new MongoDB instance.
We have a replica set and a single mongod instance (two different MongoDB networks for different purposes). Now, first I should migrate the single mongod instance and then the whole replica set to the new server.
What I want to know is, how can I migrate both instances with no down-time? I don't want to shutdown the server or stop write operations.
Thanks in advance.
So first of all you should never run mongodb as a single instance for production. At a minimum you should have 1 primary, 1 secondary and 1 arbiter.
Second, even with a replica set you will always have a bit of write downtime when you switch primaries, as writes are not possible during the election process. From the docs:
IMPORTANT Elections are essential for independent operation of a
replica set; however, elections take time to complete. While an
election is in process, the replica set has no primary and cannot
accept writes. MongoDB avoids elections unless necessary.
Elections are going to occur when for example you bring down the primary to move it to a new server or virtual instance, or upgrade the database version (like going from 2.4 to 2.6).
You can keep downtime to a minimum with an existing replica set by setting the appropriate options to allow queries to run against secondaries. Again from the docs:
Maintaining availability during a failover. Use primaryPreferred if
you want an application to read from the primary under normal
circumstances, but to allow stale reads from secondaries in an
emergency. This provides a “read-only mode” for your application
during a failover.
This takes care of reads at least. Writes are best dealt with by having your application retry failed writes, or queue them up.
Regarding your standalone the documented procedures for converting to a replica set are well tested and can be completed very quickly with minimal downtime:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
You cannot have no downtime (a new mongod will run on new IP so you need to at least connect to it). But you can minimize downtime by making geographically distributed replica set.
Please Read
http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/
Use the given process but please note:
Do not set priority 0 of instances at New Location so that they become primary when old ones at Old Location step down.
You still need to restart mongod in replica set mode at Old Location.
You need 3 instances including an arbiter at New Location, if you want it to be
replica set.
When complete data is in sync with instances at New Location, step down instances at Old Location (one by one). Now everything will go to New Location but the problem is that it is directed through a distant mongod.
So stop mongod at Old Location and start a new one at new Location. Connect your applications to New Location Mongod.
Note: I have not done the same so far. I had planned it once but then I got the problem and it was not of hosting provider. Practically you may get some issues.
Replica Set is the feature provided by the Mongodb database to achieve high availability and automatic failover.
It is kinda traditional master-slave configuration but have capability of automatic failover.
It is basically group/cluster of the mongod instances which communicates, replicates to each other to provide high availability and to do automatic failover
Basically, in replica sets there are minimum 2 and maximum of 12 mongod instances can exist
In replica set following types of server exist. out of all, one server is always primary.
http://blog.ajduke.in/2013/05/31/setup-mongodb-replica-set-in-4-steps/
John answer is right, btw in your case you have no way to avoid downtime, you can just try to make it shorter as possible.
You can prepare the new replica set and save its configuration.
Same for the single mongod instance, prepare a js file with specific configuration (ie: stuff going on the admin database).
disable client connections on production servers.
copy the datafiles from the old servers to the new ones (http://docs.mongodb.org/manual/core/backups/#backup-with-file-copies)
apply your previous saved replica set config and configuration.
done
you can use diffent ways as add an hidden secondary member on the replica set if you have a lot of data, so you can wait it's is up-to-date before stopping the production server. Basically for the replica set you have many ways to handle a migration, with the single instance instead you don't have such features.

Replica set never finishes cloning primary node

We're working with an average sized (50GB) data set in MongoDB and are attempting to add a third node to our replica set (making it primary-secondary-secondary). Unfortunately, when we bring the nodes up (with the appropriate command line arguments associating them with our replica set), the nodes never exit the RECOVERING stage.
Looking at the logs, it seems as though the nodes ditch all of their data as soon as the recovery completes and start syncing again.
We're using version 2.0.3 on all of the nodes and have tried adding the third node from both a "clean" (empty db) state as well as a bootstrapped state (using mongodump to take a snapshot of the primary database and mongorestore'ing that snapshot into the new node), each failing.
We've observed this recurring phenomenon over the past 24 hours and any input/guidance would be appreciated!
It's hard to be certain without looking at the logs, but it sounds like you're hitting a known issue in MongoDB 2.0.3. Check out http://jira.mongodb.org/browse/SERVER-5177 . The problem is fixed in 2.0.4, which has an available release candidate.
I don't know if it helps, but when I got that problem, I erased the replica DB and initiated it. It started from scratch and replicated OK. worth a try, I guess.