Replica set never finishes cloning primary node - mongodb

We're working with an average sized (50GB) data set in MongoDB and are attempting to add a third node to our replica set (making it primary-secondary-secondary). Unfortunately, when we bring the nodes up (with the appropriate command line arguments associating them with our replica set), the nodes never exit the RECOVERING stage.
Looking at the logs, it seems as though the nodes ditch all of their data as soon as the recovery completes and start syncing again.
We're using version 2.0.3 on all of the nodes and have tried adding the third node from both a "clean" (empty db) state as well as a bootstrapped state (using mongodump to take a snapshot of the primary database and mongorestore'ing that snapshot into the new node), each failing.
We've observed this recurring phenomenon over the past 24 hours and any input/guidance would be appreciated!

It's hard to be certain without looking at the logs, but it sounds like you're hitting a known issue in MongoDB 2.0.3. Check out http://jira.mongodb.org/browse/SERVER-5177 . The problem is fixed in 2.0.4, which has an available release candidate.

I don't know if it helps, but when I got that problem, I erased the replica DB and initiated it. It started from scratch and replicated OK. worth a try, I guess.

Related

How to set featureCompatibilityVersion on secondary replica set member which does not startup due to this error?

I'm situation were I am trying to get a broken member of a replica set back online. I have followed the normal procedure of deleting the data in /data/db and trying to sync. Also tried to move all the data from one secondary to the other and restart mongo. Each time I get the standard error above complaining about compatibility issues. Now the servers are the same version 3.6. So why was this an issue?... Must be that the featureCompatibilityVersion=3.6 while on the other server are set to featureCompatibilityVersion=3.4 which they are. However to see this the server needs to startup but since it kept restarting I could not check that. So I removed it from the replica set started it up and seen it was set to featureCompatibilityVersion=3.6. So I changed this to 3.4 and started it up again while part of replica set but still had the same issue. How do I get round this?
The other servers have featureCompatibilityVersion=3.4 so if I was syncing correctly would the other server not pull in admin collection which would have featureCompatibilityVersion=3.4. So why does this keep complaining?
If not how do you change this on a server which does not startup? Is there something in mongo config I can set?
Thanks for the help in advance.
here's probably not a helpful answer, just a sanity check: remove replSet parameter and spin up the mongod totally solo.... does that work ok?
I only suggest it cause I can't see what your issue is and maybe this would find something...wouldn't take long, may not help... the versioning I don't quite get...

pg_create_logical_replication_slot hanging indefinitely due to old walsender process

I am testing logical replication between 2 PostgreSQL 11 databases for use on our production (I was able to set it thanks to this answer - PostgreSQL logical replication - create subscription hangs) and it worked well.
Now I am testing scripts and procedure which would set it automatically on production databases but I am facing strange problem with logical replication slots.
I had to restart logical replica due to some changes in setting requiring restart - which of course could happen on replicas also in the future. But logical replication slot on master did not disconnect and it is still active for certain PID.
I dropped subscription on master (I am still only testing) and tried to repeat the whole process with new logical replication slot but I am facing strange situation.
I cannot create new logical replication slot with the new name. Process running on the old logical replication slot is still active and showing wait_event_type=Lock and wait_event=transaction.
When I try to use pg_create_logical_replication_slot to create new logical replication slot I get similar situation. New slot is created - I see it in pg_catalog but it is marked as active for the PID of the session which issued this command and command hangs indefinitely. When I check processes I can see this command active with same waiting values Lock/transaction.
I tried to activate parameter "lock_timeout" in postgresql.conf and reload configuration but it did not help.
Killing that old hanging process will most likely bring down the whole postgres because it is "walsender" process. It is visible in processes list still with IP of replica with status "idle wating".
I tried to find some parameter(s) which could help me to force postgres to stop this walsender. But settings wal_keep_segments or wal_sender_timeout did not change anything. I even tried to stop replica for longer time - no effect.
Is there some way to do something with this situation without restarting the whole postgres? Like forcing timeout for walsender or lock for transaction etc...
Because if something like this happens on production I would not be able to use restart or any other "brute force". Thanks...
UPDATE:
"Walsender" process "died out" after some time but log does not show anything about it so I do not know when exactly it happened. I can only guess it depends on tcp_keepalives_* parameters. Default on Debian 9 is 2 hours to keep idle process. So I tried to set these parameters in postgresql.conf and will see in following tests.
Strangely enough today everything works without any problems and no matter how I try to simulate yesterday's problems I cannot. Maybe there were some network communication problems in the cloud datacenter involved - we experienced some occasional timeouts in connections into other databases too.
So I really do not know the answer except for "wait until walsender process on master dies" - which can most likely be influenced by tcp_keepalives_* settings. Therefore I recommend to set them to some reasonable values in postgresql.conf because defaults on OS are usually too big.
Actually we use it on our big analytical databases (set both on PostgreSQL and OS) because of similar problems. Golang and nodejs programs calculating statistics from time to time failed to recognize that database session ended or died out in some cases and were hanging until OS ended the connection after 2 hours (default on Debian). All of it seemed to be always connected with network communication problems. With proper tcp_keepalives_* setting reaction is much quicker in case of problems.
After old walsender process dies on master you can repeat all steps and it should work. So looks like I just had bad luck yesterday...

How to make MongoDB `repairDatabase` and `compact` commands work with replica-set? Downtime is ok

We need to free some of our MongoDB space, and we identified 100Gb + worth of documents that can be safely removed from a collection.
So we removed them from our test environment which has this setting:
mongodb version 3.0.1
no sharding
1 node, no replica
wiredtiger engine
When done, we found out that the space on disk was still used and needed to be reclaimed. We found this post and it helped us: after running both
db.runCommand({repairDatabase: 1})
and
db.runCommand({compact: collection-name })
We freed 100Gb +.
We then proceeded in production, forgetting that the setting was different since we had 1 replica node:
mongodb version 3.0.1
no sharding
1 primary node, 1 replica node
wiredtiger engine
After removing the documents, we run
db.runCommand({repairDatabase: 1})
and got the OK message (after a while, 10 min +). We tried running
db.runCommand({compact: collection-name })
and got this error:
will not run compact on an active replica set primary as this is a
slow blocking operation. use force:true to force
So we run
db.runCommand({compact: collection-name, force: true })
and got the OK message (almost instantly), but the disk on space is still used, it wasn't freed.
we searched for solutions for running the repairDatabase and compact commands with replica-set but the advise was focused on avoiding downtime as if that was the only issue. However, we can schedule downtime and our problem is rather that the commands don't work as expected since the space is not actually reclaimed.
What did we do wrong?
For replica set configurations, the best and the safest method to recover space is to perform an initial sync. If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it.
Note that the rolling initial sync is only possible if your deployment contains at least three nodes replica set (for reasons I will describe below).
Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.
Having said that, there are some things that are worth mentioning:
Regarding compact:
The compact command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact on WiredTiger could silently fail.
Regarding repairDatabase:
Please don't run repairDatabase on replica set nodes. This is strongly not recommended, as mentioned in the repairDatabase page. The name repairDatabase is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption, which could lead to corrupt documents.
The repairDatabase command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents, in an attempt to get the database into a state where you can start it and salvage intact document from it.
In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase will dutifully remove the corrupt documents. Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash. To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.
Regarding your setup:
I noticed that in your production environment, you created a replica set with two nodes. This setup is not recommended, since the failure of a single node will render the remaining node to become a secondary, and thus disallowing writes to the set.
Due to the way MongoDB high availability works (see Replica Set Election), it's strongly recommended to deploy three data-bearing nodes at a minimum, or at least add an arbiter node (see Replica Set Members) so that your replica set contains an odd number of members.
Having only two members in a replica set also makes rolling upgrades/initial sync/maintenance much harder or even impossible in some cases.
MongoDB 3.0.1 was released in March 17, 2015, which is more than 2 years ago as of this writing. If you're forced to use MongoDB 3.0 series, please consider moving to 3.0.15. Or better yet, to 3.4.7 (the latest as of Aug 10, 2017), which contains massive improvements over 3.0.1.

MongoDb preparing for Sharded Clusters

We are currently setting up our mongodb environment for production. At the moment we only have one dedicated mongodb database server. We will expand this in the near future with a 2nd server and I already indicated to the management that for the ideal situation we should get a 3rd server as well.
Since I already know we're going to use sharding and replication in the near future I want to be prepared for it.
The idea I have now is to start now with the Development Configuration (as mongo's documentation names it).
Whenever our second server comes available I would like to expand this setup to a configuration with 2 configuration servers en 2 shards (replica sets).
And of course when our third server comes available have the fully functional sharded cluster configuration.
While reading mongo's documentation I was getting triggered by the note that de Development setup should not be used in production.
MongoDb Development Configuration
Keeping in mind that we will add more servers soon, would it be a bad idea to already configure the Development Configuration already so we can easily add the 2nd server to the cluster when it comes available?
After setting up the 'development sharded setup' I've found my anwser. Of course i'm happy to share in case anybody runs into the same questions as I do when starting with this.
In my case, it was ok to start with the development setup untill my new servers arrived. It was a temporary situation and when my new servers arived I was able to easily expand my replicasets. There are a number of reasons why this isn't adviced for production:
To state the obvious, there is no replication yet. Since I was running shards on one machine there is a single point of failure. If the machine, or one node goes down, the cluster won't work anymore.
Now this part is interesting. After I added a second server, I did have primary and secondary nodes. Primary nodes were used for writing and secondary for reading. I've eliminated the issue that there was no replication AND my data had a higher availability. However, I noticed with the 2-member replica sets, if one member of the replicaset went down (even is this was a secondary), the primary stepped down to a secondary node as well. This had to do with the voting mechanism that MongoDb uses. See Markus' more detailed answer on this.. Since there are no more primaries in the replicaset, my cluster won't function anymore. Now, if i were to use an arbiter I could eliminate this problem as well.
When you have a 3-member replicataset, automatic failover kicks in. Whenever a node goes down, another primary is assigned automatically and the cluster will continue performing as before.
During my tests I also got to a point where one of my MongoD.exe instances stopped working due to a "Out of memory exception". I was running a cluster with 3 replicasets, meaning every machine had at least 4 mongod.exe processes running (3 for the replicaset shards and one for the configuration server replicaset). Besides having a query which wasn't optimized yet I also noticed that the WiredTiger storage engine by default can use up to 50% of ram minus one gigabyte. Perhaps it wasn't the best choise to have multiple replicaset-shards on one machine but I was able to eliminate the problem by capping the wiredtiger memory usage.
I hope this answer helps anybody who's starting to set up replication and sharding for MongoDb.

MongoDb Ops Manager can't start mongos and shard

I Came by a problem where i have an Ops Manager that suppose to run a MongoDB cluster as an automated cluster.
Suddenly the servers started going down, unexpectedly - while there are no errors in any of the log files indicating on when is the problem.
The Ops Manager gets stuck on the blue label
We are deploying your changes. This might take a few minutes
And it just never goes away.
Because this environment is based on the automation feature, the mms is managing the user on the servers and runs all of the processes from "mongod" which i can't access even as a Root (administrator).
As far as the Ops Manager goes it shows that a shard in a replica set is down although it's live, and thinks that a mongos that is dead is alive.
Has someone got into this situation before and may be able to help ?
Thanks,
Eliran.
Problem found: there was an ntp mismatch between the servers in the cluster somehow, so what happened was that the servers were not synced and everytime the ops manager did something it got responses with wrong times and could not use it's time limits.
After re-configuring all the ntp's back to the same one - everything got back to how it should have been :)