MongoDB replica set failed - mongodb

I am having a MongoDB Replica set consisting three nodes, 1 Primary, 1 Secondary and one Arbiter.
When I was performing the initial re-sync on secondary node from the primary, the primary node got terminated. When I checked the logs of primary node the exception being shown was
SEVERE: Invalid access at address: 0x7fcde1e00ff0SEVERE: Invalid access at address: 0x7fcde1e00ff0
SEVERE: Got signal: 7 (Bus error)
Since then this primary node is not getting started due to this exception and secondary node is stuck in the STARTUP2 state.
I am able to start the primary node on different port as a standalone node (or in maintenance mode) and read its data. But whenever I am running it as a part of replica set it is getting terminated with above exception
The primary and secondary are having RAID0 as their storage. The data size is around 550GB.
I copied the whole data of primary node(currently down) to the secondary node(in STARTUP2 state) and then restarted the secondary node. But it also didn't worked. Secondary node getting elected to primary on restart but also getting terminated within a second of election with below exception :
SEVERE: Fatal DBException in logOp(): 10334 BSONObj size: 50359410 (0x3006C72) is invalid. Size must be between 0 and 16793600(16MB) First element: 2: ?type=111
SEVERE: terminate() called, printing stack (if implemented for platform):
0x11fd1b1 0x11fc438 0x7ff56dc01846 0x7ff56dc01873 0xe54c9e 0xc4de1b 0xc58f46 0xa0bac1 0xa0c250 0xa0f1bf 0xa0fcc1 0xa1323e 0xa2949a 0xa2af32 0xa2cd36 0xd61654 0xba21a2 0xba3780 0x7724a9 0x11b2fde
How to recover and restore the replica set in this case.
I am also having the backup of this data. Can I drop this replica set and recreate the replica set with this backup data ?
There is another replica set in this MongoDB cluster which is working fine.

Your secondary server's eligibility is impossible due to replication lag.
Can you post your rs.status()'s output?
Your secondary server probably has a "could not find member to sync from" infoMessage.
I've run through something similar before due to bad RAM. It can be whatever.
Fix it by copying the primary server's data into another folder on the secondary and start a new instance on some other port on it, and then add it to the replica ( with the { force: true } options ) so the secondary server have somewhere to sync from.
You can also destroy the replica and create it again, but beware not to loose your replica's op-log.

Related

mongodb failure to resync a stale member of a replica set

I have mongodb (version 4.2) replicaset with 3 nodes - primary, secondary, arbiter,
primary occupies close to 250 GB disk space, oplog size is 15 GB
secondary was down for few hours, tried recovering it by restarting, it went into recovering forever.
tried initial sync by deleting files on data path, took 15 hours, data path size went to 140GB and failed
tried to copy files from primary and seed it to recover secondary node
followed https://www.mongodb.com/docs/v4.2/tutorial/resync-replica-set-member/
This did not work - (again stale)
in the latest doc (5.0) they mention to use a new member ID, does it apply for 4.2 as well?
changing the member ID throws error as IP and port is same for node I am trying to recover
This method was also unsuccessful, planning to recover the node using different data path and port as primary might consider it as a new node, then once the secondary is up, will change the port to which I want and restart, will it work?
please provide any other suggestions to recover a replica node with large data like 250 GB
shut down primary
Copying the data files from primary node, placing it in new db path (other than the recovering nodes db path)
changing log path
starting mongo service with different port (other than the one used by recovering node)
start primary
adding it to replicaset using rs.add("IP:new port") on primary
worked, could see the secondary node coming up successfully

PRIMARY is transitioning to RECOVERING after restart

I have a three-node replica set (1 Primary, 1 Secondary, 1 Arbiter), on three different Amazon server instances. The servers where they are hosted required a memory upgrade so I needed to shut down the MongoDB instances as well.
I shut down the MongoDB instances in this order:
Secondary
Arbiter
Primary
I used the process below for shutting down each server
use admin
db.shutdownServer()
All MongoDB instances did shut down properly without any problems. So far everything is fine.
After the Amazon server upgrade, I started the MongoDB instances in the following order:
Arbiter
Secondary
Primary
The arbiter is in arbiter mode and secondary is in secondary mode, but to my surprise the primary machine went to "RECOVERING" mode.
I don't know the reason, why the primary machine went to "RECOVERING".
I have examined logs. It is showing no member to sync...something stuff like that
My basic doubt is "PRIMARY has to be in PRIMARY until there is reconfig happens in replica set".
Am I missing a step during the shutdown of servers? Or am I missing a step during the restart of servers?
Please shed some light on this so that how can I overcome this problem. I need to shut down the MongoDB servers frequently since there is a lot of upgrades happening in Amazon servers.
After you started you replica set, you "SECONDARY" became "PRIMARY" and you "PRIMARY" was probably at secondary state after short while. To keep primary status at your "PRIMARY", you must give it higher priority than what your "SECONDARY" have.
Check with rs.conf() command.
Check here how to force node to be primary

MongoDB & ElasticSearch configuration

MongoDB is always showing me this error message when I insert any data in my collection.
I am trying to configure ElasticSearch with mongodb, when I realized my Replica. I try to add something, but no results.
Mongo Shell always shows me the same message:
WritResult({"writeError":{"code":undefined,"errmsg":"not master"}})
This happens when you do not have a 3 node Replica Set, and you start the replica in Master-Slave configuration and after that your master goes down or the secondary goes down. Since there is not third node or Arbiter to elect a new primary, the primary steps down as master and in pure read only mode. The only way to bring the replica set up is to create a new server with the same repl-set name and add the current stepped down master as secondary to it.

Do you lose records when you reconfigure mongodb replicaset?

I have 3 member replicaSet in MongoDB which fell apart when re-configuring the host names of the sever instances. I had to reconfigure the replicaSet, however I am curious how MongoDB handles records that are not synced across all the members.
Case 1) There is a new record on the MongoDB server that I access to reconfigure the set.
Case 2) There is a new record on another MongoDB server that is added later to the replica set.
Each replica-set has one primary node and one or more secondary nodes.
All writes happen on the primary. The primary then sends these changes to the secondaries (the list of changes is referred to as "the oplog"). That means the primary is always the member with the most up-to-date data.
When the primary is suddenly unreachable, the replica-set is put into read-only mode and an election takes place to find a new primary. Usually the secondary which is most up-to-date is selected (more details on replica-set election). Any writes which were not propagated to that secondary yet are lost.
When the old primary goes back online, it re-joins the replica-set as a secondary. Its data gets synchronized to the state of the new primary. Any writes which only happened on the old primary which weren't propagated to the new primary before the crash are rolled back.
The rolled-back writes are backed up as bson-files in the directory /rollback and can be re-added to the replica-set using bsondump and mongorestore. Details about this procedure can be found in the article Rollbacks During Replica Set Failover

why closing down the primary doesn't make replica to vote a new primary in mongodb replica set

I am experimenting the mongodb replica set, in which I started localhost:27017 as secondary and localhost:27018 as primary.
then I disconnect the localhost:27018, and I expected localhost:27017 to automatically become a primary, but it doesn't work as expected. In the shell, using command like rs.add() or rs.remove() gives out a error like ' "errmsg" : "replSetReconfig command must be sent to the current replica set primary.",'
I know this error is because the command is now running in the secondary, but since the primary is closed down already. what steps I should do now?
And also why closing down the primary doesn't allow the replica set to vote for a new primary ? what is the flexible way to make it automatic vote for a new primary
To elect a new primary in MongoDB replica set the majority of the members must vote for the new primary. That's majority of the original (or configured) members.
You configured two members - one is not a majority. You need to add a third member, either a regular node or an arbiter, if you want your failover to happen automatically.