MongoDB WiredTigerLAS.wt file growth unexpected (Version 3.6.11) Cluster - mongodb

We have a MongoDB cluster with 1 primary server and 2 secondary servers.
All three servers are using using MongoDB version 3.6.11.
Recently, we found both secondary servers down and it is caused by the space being full. We found the file WiredTigerLAS.wt has grown very big to over 20GB. The whole mongodb data folder is supposed to be below 4GB. We tried to remove the WiredTigerLAS.wt and restarted the secondary servers, but the WiredTigerLAS.wt got created and started growing to be full of the disk again. The primary server has been OK, no impact is found.
Can someone please help and advice what we shall do now? If you know the reason behind the unexpected file growth, please let us know.
rs.conf:
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 7,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "primary:50001",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "replica1:50001",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 0
},
{
"_id" : 2,
"host" : "replica2:50001",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 0
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5b52ac682b4bd7ae7913b1cf")
}
}

Related

mongodb: restore replicaset after kubernetes scaling down

I configured a replicatset correctly.
After having scaled down mongodb kubernetes pods, replicat set truned out to invalid status:
> rs.status();
{
"ok" : 0,
"errmsg" : "Our replica set config is invalid or we are not a member of it",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig"
}
My configuration is:
> rs.config();
{
"_id" : "rs0",
"version" : 3,
"term" : 2,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 0,
"host" : "mongors-0.mongors-service.hes-all.svc:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "mongors-1.mongors-service.hes-all.svc:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("626fb63f211511c4dcf938ac")
}
}
configuration details seem right, but when I run rs.initiate, or rs.reconfig(cfg):
> rs.reconfig(config);
{
"topologyVersion" : {
"processId" : ObjectId("6347bdffe3c3303e6f325b9a"),
"counter" : NumberLong(1)
},
"ok" : 0,
"errmsg" : "New config is rejected :: caused by :: replSetReconfig should only be run on a writable PRIMARY. Current state REMOVED;",
"code" : 10107,
"codeName" : "NotWritablePrimary"
}
> rs.initiate();
{
"ok" : 0,
"errmsg" : "already initialized",
"code" : 23,
"codeName" : "AlreadyInitialized"
}
Any ideas?

MongoDB Replicaset - How to solve "Could not find self in current config" on Secondary node

I am running a MongoDB server running V4.4.14 installed on VMs configured as follows:
1 Primary
2 Secondary
1 Arbiter
Nodes are not on the same local network and public hostnames configured on DNSs are used.
After I am trying to add another (delayed) secondary but I get the following error:
ReplicaSet Status:
rs.status().members
{
"_id" : 5,
"name" : "xxxx-4.xxx.xx:27123",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
...
"pingMs" : NumberLong(31),
"lastHeartbeatMessage" : "Our replica set configuration is invalid or does not include us",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : -1,
"configTerm" : -1
}
MongoDB Logs on the Primary
{"t":{"$date":"2022-05-14T23:20:34.328+02:00"},"s":"I", "c":"REPL_HB", "id":23974, "ctx":"ReplCoord-16984","msg":"Heartbeat failed after max retries","attr":{"target":"xxxx-4.xxx.xx:27123","maxHeartbeatRetries":2,"error":{"code":93,"codeName":"InvalidReplicaSetConfig","errmsg":"Our replica set configuration is invalid or does not include us"}}}
MongoDB Logs on the Secondary
{"t":{"$date":"2022-05-14T21:21:58.661+00:00"},"s":"I", "c":"REPL", "id":3564900, "ctx":"ReplCoord-94","msg":"Could not find self in current config, retrying DNS resolution of members","attr":{"target":"xxxx-2.xxx.xx:27123","currentConfig":{"_id":"rs0","version":133097,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"members":[{"_id":1,"host":"xxxx-1.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.6,"tags":{},"slaveDelay":0,"votes":1},{"_id":2,"host":"xxxx-2.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.1,"tags":{},"slaveDelay":0,"votes":1},{"_id":3,"host":"xxxx-3.xxx.xx:27123","arbiterOnly":true,"buildIndexes":true,"hidden":false,"priority":0.0,"tags":{},"slaveDelay":0,"votes":1},{"_id":4,"host":"xxxx-6.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.3,"tags":{},"slaveDelay":0,"votes":1},{"_id":5,"host":"xxxx-4.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.0,"tags":{},"slaveDelay":3600,"votes":0}],"settings":{"chainingAllowed":false,"heartbeatIntervalMillis":3000,"heartbeatTimeoutSecs":15,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"5df4f4f01223ca52c6ab5ebe"}}}}}
MongoDB Configuration
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 133097,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 1,
"host" : "xxxx-1.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.6,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "xxxx-2.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 3,
"host" : "xxxx-3.xxx.xx:27123",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 4,
"host" : "xxxx-6.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.3,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 5,
"host" : "xxxx-4.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(3600),
"votes" : 0
}
],
"settings" : {
"chainingAllowed" : false,
"heartbeatIntervalMillis" : 3000,
"heartbeatTimeoutSecs" : 15,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5df4f4f01223ca52c6ab5ebe")
}
}
I can see the error is handled on this file:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/topology_coordinator.cpp
and probably means that the secondary doesn't recognise its hostname as being 'self', hence its reporting 'not part of the config', however
I can traceroute from the secondary and the route comes back to self
I can connect via mongo shell from the secondary to all other nodes and from all other nodes into the secondary
I can ssh from outside into the secondary node
I CANNOT mongo shell or ssh using the public DNS hostname from within the secondary node
This node used to be part of the configuration without issues, was cleanly removed and now this issue has appeared
Any pointers to what can be the issue?
This had me wandering around for about one month.
However, as I was writing the question on this forum, that last point had me thinking.
I am still not sure why I couldn't mongo shell/ssh using the public hostname however, adding the hostname to the /etc/hosts file and pointing to the local IP (by overriding the DNS response) solved the issue.
That is: The issue was exactly what was written on the tin - the replica couldn't connect to itself by using the public hostname.
The dirty solution: I added the public hostname to the /etc/hosts file pointing to the local IP. This solved the issue.
I still need to figure out why I cannot connect via the public hostname but that is a different issue at this point.

Is calling rs.slaveOk() safe to do on a secondary that is hidden and has a priority of 0?

I have a replica set with two secondaries. These secondaries are not meant for failover, they are backups of the master, one of them with a delay of 1 day. I've set them both to hidden=true and priority=0
rs.conf().members on the master yields
[
{
"_id" : 0,
"host" : "localhost:4000",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "localhost:4001",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "localhost:4002",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(86400),
"votes" : 1
}
]
I want to check their content, mainly to issue .count() queries.
Is it safe to call rs.slaveOk() on those secondaries?

mongodb who is primary in replication

How can I display which one is the primary mongodb machine from a Secondary replication, without being forced to login to every machine and check?
Running ismaster command only provide that current machine is secondary
rs0:SECONDARY> db.runCommand("ismaster")
{
"hosts" : [
"dbRby1:27017",
"dbRby2:27017",
"dbKrstd1:27017"
],
"setName" : "rs0",
"setVersion" : 5,
"ismaster" : false,
"secondary" : true,
"me" : "dbRby1:27017",
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-11-24T07:36:09.855Z"),
"maxWireVersion" : 4,
"minWireVersion" : 0,
"ok" : 1
}
or by using the rc.conf(), I can't see that either
rs0:SECONDARY> rs.conf()
{
"_id" : "rs0",
"version" : 5,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "dbRby1:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "dbRby2:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "dbKrstd1:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : "majority",
"wtimeout" : 5000
},
"replicaSetId" : ObjectId("5811ec4c70c224f06fba884b")
}
}
rs.status() will give the wanted information as #Xenwar

mongodb replica set is ok but does not replicate

I've created my mongodb replicaset and all is correct except is not replicate in remote host, but I've tried to access remote pc at the port 27017 and working properly.
I created the database on the remote PC to see if this solved but nothing, I have also inserted new records but nothing, Any ideas?
rs.status()
{
"set" : "meteor",
"date" : ISODate("2016-03-08T16:14:24.181Z"),
"myState" : 1,
"term" : NumberLong(3),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "172.27.10.13:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 6920,
"optime" : {
"ts" : Timestamp(1457453535, 2),
"t" : NumberLong(3)
},
"optimeDate" : ISODate("2016-03-08T16:12:15Z"),
"electionTime" : Timestamp(1457446744, 1),
"electionDate" : ISODate("2016-03-08T14:19:04Z"),
"configVersion" : 1,
"self" : true
}
],
"ok" : 1
}
rs.conf
rs.conf( rs.config(
meteor:PRIMARY> rs.config()
{
"_id" : "meteor",
"version" : 1,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "172.27.10.13:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
According to your configuration settings you didn't add any other members (secondaries, arbiters etc.) to your replica-set configuration. Beacuse of this mongodb has no way of knowing where to replicat to.
Try adding your remote host to the replica-set configuration like this:
rs.add("your-remote-host:port")
See: https://docs.mongodb.org/manual/tutorial/expand-replica-set/