MongoDB hidden secondary stuck in startup? - mongodb

I am creating a hidden secondary MongoDB instance that will eventually be used for reporting. So far I have taken these steps:
Started up my primary instance (local machine) with replSet = mySet and called rs.initiate()
Started up my secondary instance with with replSet = mySet
Called rs.add("my.secondary.com") from my primary instance
set priority = 0 and hidden = true for the secondary member using rs.reconfigure(cfg)
When I do this and call rs.status() I get the following output:
{
"set": "mySet",
"date": ISODate("2016-03-22T16:40:39.515Z"),
"myState": 1,
"members": [
{
"_id": 0,
"name": "my-machine.local:27017",
"health": 1,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 607,
"optime": Timestamp(1458664559, 1),
"optimeDate": ISODate("2016-03-22T16:35:59Z"),
"electionTime": Timestamp(1458664264, 2),
"electionDate": ISODate("2016-03-22T16:31:04Z"),
"configVersion": 3,
"self": true
},
{
"_id": 1,
"name": "my.secondary.com:27017",
"health": 1,
"state": 0,
"stateStr": "STARTUP",
"uptime": 384,
"optime": Timestamp(0, 0),
"optimeDate": ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat": ISODate("2016-03-22T16:40:38.332Z"),
"lastHeartbeatRecv": ISODate("1970-01-01T00:00:00Z"),
"pingMs": 106,
"configVersion": -2
}
],
"ok": 1
}
Notice that stateStr for my secondary is STARTUP - this never changes and the data never replicates. In a previous attempt I also called rs.iniate() on my secondary, but that made what was intended to be the secondary become the primary. I had to blow everything away and start again.
Why is my secondary stuck in STARTUP and how can I get my data to begin replicating from my primary to my secondary?

Here is checklist from my black book:) compare your steps, it should go without a glitch.
(assuming you initiated mongodb instances with --replSet flag)
// rs.initiate()
// rs.add("host-1:29001")
// rs.add("host-2:30001")
// rs.add("host-n:40001")
// var cfg = rs.config()
// cfg.members[2].priority = 0
// cfg.members[2].hidden = true
// rs.reconfig(cfg)

Related

MongoDB RelicaSet is in state RS_DOWN and InterruptedDueToReplStateChange Exception

We setup a mongodb replica set with 3 nodes (version 3.6). Now We're having this exception thrown by the mongodb client application:
<< ErrorHandlerProcessor >> Query failed with error code 96 and error message 'Executor error during find command: InterruptedDueToReplStateChange: operation was interrupted' on server mongodb-0.mongodb-internal.dp-common-database.svc.cluster.local:27017; nested exception is com.mongodb.MongoQueryException: Query failed with error code 96 and error message 'Executor error during find command: InterruptedDueToReplStateChange: operation was interrupted' on server mongodb-0.mongodb-internal.dp-common-database.svc.cluster.local:27017
After checking the mongodb server logs, we noticed that it elected a new primary due to some reasons at that time, but we cannot find out any errors. Can anyone help point me the cause or how to troubleshoot this issue?
Thanks.
And below is the mongodb logs from 2 nodes for that particular period:
mongodb-0
2020-09-30T06:19:32.238+0000 I NETWORK [conn38321] end connection 172.28.42.10:58362 (115 connections now open)
2020-09-30T06:40:15.730+0000 I COMMAND [PeriodicTaskRunner] task: UnusedLockCleaner took: 259ms
2020-09-30T06:40:15.757+0000 I COMMAND [conn38197] command admin.$cmd command: isMaster { ismaster: 1, $db: "admin" } numYields:0 reslen:793 locks:{} protocol:op_msg 107ms
2020-09-30T06:40:15.849+0000 I REPL [replexec-2645] Member mongodb-1.mongodb-internal.dp-common-database.svc.cluster.local:27017 is now in state RS_DOWN
2020-09-30T06:40:15.854+0000 I REPL [replexec-2645] Member mongodb-2.mongodb-internal.dp-common-database.svc.cluster.local:27017 is now in state RS_DOWN
2020-09-30T06:40:15.854+0000 I REPL [replexec-2645] can't see a majority of the set, relinquishing primary
2020-09-30T06:40:15.854+0000 I REPL [replexec-2645] Stepping down from primary in response to heartbeat
2020-09-30T06:40:15.865+0000 I REPL [replexec-2648] Member mongodb-2.mongodb-internal.dp-common-database.svc.cluster.local:27017 is now in state SECONDARY
2020-09-30T06:40:15.873+0000 I REPL [replexec-2649] Member mongodb-1.mongodb-internal.dp-common-database.svc.cluster.local:27017 is now in state SECONDARY
2020-09-30T06:40:15.885+0000 E QUERY [conn38282] Plan executor error during find command: DEAD, stats: { stage: "LIMIT", nReturned: 1, executionTimeMillisEstimate: 20, works: 1, advanced: 1, needTime: 0, needYield: 0, saveState: 0, restoreState: 0, isEOF: 1, invalidates: 0, limitAmount: 1, inputStage: { stage: "FETCH", nReturned: 1, executionTimeMillisEstimate: 20, works: 1, advanced: 1, needTime: 0, needYield: 0, saveState: 0, restoreState: 0, isEOF: 0, invalidates: 0, docsExamined: 1, alreadyHasObj: 0, inputStage: { stage: "IXSCAN", nReturned: 1, executionTimeMillisEstimate: 20, works: 1, advanced: 1, needTime: 0, needYield: 0, saveState: 0, restoreState: 0, isEOF: 0, invalidates: 0, keyPattern: { consumer: 1.0, channel: 1.0, externalTransactionId: -1.0 }, indexName: "lastsequence", isMultiKey: false, multiKeyPaths: { consumer: [], channel: [], externalTransactionId: [] }, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: "forward", indexBounds: { consumer: [ "["som", "som"]" ], channel: [ "["normalChannelMobile", "normalChannelMobile"]" ], externalTransactionId: [ "[MaxKey, MinKey]" ] }, keysExamined: 1, seeks: 1, dupsTested: 0, dupsDropped: 0, seenInvalidated: 0 } } }
2020-09-30T06:40:15.887+0000 I REPL [replexec-2645] transition to SECONDARY from PRIMARY
mongodb-1
September 30th 2020, 14:38:40.871 2020-09-30T06:38:40.871+0000 I REPL [replication-343] Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: mongodb-0.mongodb-internal.dp-common-database.svc.cluster.local:27017, OpTime { ts: Timestamp(1601448015, 5), t: 19 }, its sync source index:-1
2020-09-30T06:38:40.871+0000 I REPL [replication-343] Choosing new sync source because our current sync source, mongodb-0.mongodb-internal.dp-common-database.svc.cluster.local:27017, has an OpTime ({ ts: Timestamp(1601448015, 5), t: 19 }) which is not ahead of ours ({ ts: Timestamp(1601448015, 5), t: 19 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)
[replexec-7147] Starting an election, since we've seen no PRIMARY in the past 10000ms

Getting zero results in search using elastic4s

This is a small code I am using to do a simple search:
import com.sksamuel.elastic4s.{ElasticsearchClientUri, ElasticClient}
import com.sksamuel.elastic4s.ElasticDsl._
import org.elasticsearch.common.settings.ImmutableSettings
object Main3 extends App {
val uri = ElasticsearchClientUri("elasticsearch://localhost:9300")
val settings = ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch").build()
val client = ElasticClient.remote(settings, uri)
if (client.exists("bands").await.isExists()) {
println("Index already exists!")
val num = readLine("Want to delete the index? ")
if (num == "y") {
client.execute {deleteIndex("bands")}.await
} else {
println("Leaving this here ...")
}
} else {
println("Creating the index!")
client.execute(create index "bands").await
client.execute(index into "bands/artists" fields "name"->"coldplay").await
val resp = client.execute(search in "bands/artists" query "coldplay").await
println(resp)
}
client.close()
}
This is result that I get:
Connected to the target VM, address: '127.0.0.1:51872', transport: 'socket'
log4j:WARN No appenders could be found for logger (org.elasticsearch.plugins).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Creating the index!
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Disconnected from the target VM, address: '127.0.0.1:51872', transport: 'socket'
Process finished with exit code 0
Creation of an index and adding document to this index is running fine but a simple search query is giving no result. I even checked this on Sense.
GET bands/artists/_search
{
"query": {
"match": {
"name": "coldplay"
}
}
}
gives
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
"_index": "bands",
"_type": "artists",
"_id": "AU21OYO9w-qZq8hmdTOl",
"_score": 0.30685282,
"_source": {
"name": "coldplay"
}
}
]
}
}
How to solve this issue?
I suspect what is happening is that you are doing the search straight after the index operation in your code. However in elasticsearch documents are not ready for search immediately. See refresh interval setting here. (So when you use the rest client, you are waiting a few seconds by virtue of the fact you have to manually flick between tabs, etc).
You could test this quickly by putting a Thread.sleep(3000) after the index. If that confirms it then works, then you need to think about how you want to write your program.
Normally you just index, and when the data is available, then its available. This is called eventual consistency. In the meantime (seconds) users might not have it available to search. That's usually not a problem.
If it IS a problem, then you will have to do some tricks like we do in the unit tests of elastic4s where you keep 'count'ing until you get back the right number of documents.
Finally, you can also manually 'refresh' the index to speed things up, by calling
client.execute {
refresh index "indexname"
}
But that's usually only used when you turn off the automatic refreshing for bulk inserts.

Single Instance Mongodb Replica Set - cannot perform query/insert operations

After installing mongodb, I ran mongod with
mongod --dbpath <pathtodb> --logpath <pathtolog> --replSet rs0
I then connected with the mongo shell and ran
rs.initiate()
I then tried to insert a document into a collection, but received an error:
> db.blah.insert({a:1})
WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })
Looking at rs.status(), I see the status is REMOVED:
> rs.status()
{
"state" : 10,
"stateStr" : "REMOVED",
"uptime" : 1041,
"optime" : Timestamp(1429037007, 1),
"optimeDate" : ISODate("2015-04-14T18:43:27Z"),
"ok" : 0,
"errmsg" : "Our replica set config is invalid or we are not a member of it",
"code" : 93
}
I have no idea what I could have done to mess this up. This should have worked I think. How do I get past this?
As above answers said, the config is not set correctly.
I tried to re-init the replica, but got the error msg:
singleNodeRepl:OTHER> rs.initiate({ _id: "rs0", members: [ { _id: 0, host : "localhost:27017" } ] } )
{
"info" : "try querying local.system.replset to see current configuration",
"ok" : 0,
"errmsg" : "already initialized",
"code" : 23,
"codeName" : "AlreadyInitialized"
}
The solution is to reconf the mongo:
singleNodeRepl:OTHER> rsconf = rs.conf()
singleNodeRepl:OTHER> rsconf.members = [{_id: 0, host: "localhost:27017"}]
[ { "_id" : 0, "host" : "localhost:27017" } ]
singleNodeRepl:OTHER> rs.reconfig(rsconf, {force: true})
{ "ok" : 1 }
singleNodeRepl:OTHER>
singleNodeRepl:SECONDARY>
singleNodeRepl:PRIMARY>
Problem here is that you ran rs.initiate().. As EMPTY! You didn't tell what machines belongs to that replica set.
So..
rs.initiate({
_id: "rs0",
version: 1,
members: [
{ _id: 0, host : "address.to.this.machine:27017" }
]
}
)
Short Answer:
I needed to do:
rs.initiate({_id:'rs0', version: 1, members: [{_id: 0, host:'localhost:27017'}]}
rather than rs.initiate().
Long Answer:
I am almost the same as #Madbreaks and #Yihe 's comment, but I was in the different background so that I'm adding my comment here.
Background
I used docker container of mongoDB and initiated the replicaset by rs.initiate(). (The data volume is mounted to the host, but it is out-of-topic here).
What Happend
When I restart the mongoDB container, the error "MongoError: not master and slaveOk=false" happened. Yes, the error message was different from #d0c_s4vage 's, but the workaround is the same as #Yihe 's.
Root Cause
The root cause was dynamically assigned hostname as follows:
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
...
"members" : [
{
...
"host" : "ee4ed99b555e:27017", # <----- !!!!
Where, the host "ee4..." above comes from docker container's internal hostname; this is set by my first rs.initiate(). This would be changed when recreate container. In my case, localhost is fine because of single server and single replicaset for 'rocketchat' app evaluation purpose.
I'm also facing same issue, I tried below steps,
NOTE: If you have already cluster setup follow my steps
I stopped particular server (host : "address.to.this.machine:27017")
Remove mongod.lock file
create one more data directory
- (deafult: /data/db, new Data directory: /data/db_rs0)
update the **configuration ** file
-change dbpath ( "/data/db_rs0" ),
- check bindIP (default: 127.0.0.0 to 0.0.0.0)
Check Hostname & Hosts
hostname
sudo vi /etc/hosts
add to hosts
127.0.0.0 hostname
127.0.1.1 hostname
(add your Public/Private IP) hostname
Start the MONGODB server in
sudo /usr/bin/mongod -f /etc/mongod.conf &
rs.initiate({ _id: "rs0", members: [ { _id: 0, host : "hostname:27017" } ] } )
rs.status()
{
....
"members": [
{
"_id": 0,
"name": "hostname:27017",
"health": 1,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 98231,
"optime": {
"ts": Timestamp(1474963880, 46),
"t": NumberLong(339)
},
"optimeDate": ISODate("2016-09-27T08:11:20Z"),
"electionTime": Timestamp(1474956813, 1),
"electionDate": ISODate("2016-09-27T06:13:33Z"),
"configVersion": 12,
"self": true
},
...........
]
"ok": 1
}
-------- THANK YOU --------

mongodb replica set changes take two attempts to work

I'm using using Mongodb 2.6.1
I have a 4 node solution across two data centres (2 dbs, 2 arbiters but one arbiter is always out of the replicate set)
{
"_id": "prelRS",
"members": [
{
"_id": 1,
"host": "serverInDataCenter1:27011",
"priority": 6
},
{
"_id": 3,
"host": "serverInDataCenter2:27013",
"priority": 0
},
{
"_id": 5,
"host": "serverInDataCenter1:27015",
"arbiterOnly": true
}
]
}
when we have a DR situation and need to use the DataCenter2 only, we will
and when I try to take the primary out of the relica set and make the secondary be the primary it takes two attempts to force the configuration to apply, for what seems like transient state issues. below was applied to the 27013 node, and all done in the space of a few seconds.
prelRS:SECONDARY> cfg={
... "_id": "prelRS",
... "members": [
... {
... "_id": 3,
... "host": "serverInDataCenter2:27013",
... "priority": 4
... },
... {
... "_id": 6,
... "host": "serverInDataCenter2:27016",
... "arbiterOnly": true
... }
... ]
... }
{
"_id" : "prelRS",
"members" : [
{
"_id" : 3,
"host" : "serverInDataCenter2:27013",
"priority" : 4
},
{
"_id" : 6,
"host" : "serverInDataCenter2:27016",
"arbiterOnly" : true
}
]
}
prelRS:SECONDARY> rs.reconfig(cfg, {force : true})
{
"errmsg" : "exception: need most members up to reconfigure, not ok : serverInDataCenter2:27016",
"code" : 13144,
"ok" : 0
}
prelRS:SECONDARY> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }
prelRS:SECONDARY>
this also seems to be the case when I am adding the 27011 node back in as well (as a lower priority replica) from the 27013 node
cfg={
"_id": "prelRS",
"members": [
{
"_id": 1,
"host": "serverInDataCenter1:27011",
"priority": 2
},
{
"_id": 3,
"host": "serverInDataCenter2:27013",
"priority": 4
},
{
"_id": 5,
"host": "serverInDataCenter1:27015",
"arbiterOnly": true
}
]
}
prelRS:PRIMARY> rs.reconfig(cfg)
{
"errmsg" : "exception: need most members up to reconfigure, not ok : serverInDataCenter1:27015",
"code" : 13144,
"ok" : 0
}
prelRS:PRIMARY> rs.reconfig(cfg)
2014-08-08T20:53:03.192+0100 DBClientCursor::init call() failed
2014-08-08T20:53:03.193+0100 trying reconnect to 127.0.0.1:27013 (127.0.0.1) failed
2014-08-08T20:53:03.193+0100 reconnect 127.0.0.1:27013 (127.0.0.1) ok
reconnected to server after rs command (which is normal)
but it doesn't seem to happen when i make this node primary again with the first config i mentioned (applied on the 27011 node)
but this action isn't adding arbiters to the set, so maybe that's a clue as to what is going on?
I realize now that I probably need to leave 27011 in the replica set as priority 0 during DR, even though it is not available, but if all of dataCenter1 was not available, i would still have to add 27016 to the set and take 27105 out, and would face the error above when invoking DR
any suggestions why this takes two attempts to work?
thanks
The problem is that your arbiters aren't "real" replica set members, so for some operations you're not going to be able to reach a quorum on the set.

MongoDB replica set preventing queries to secondary

To set up the replica set, I've run in 3 separate terminal tabs:
$ sudo mongod --replSet rs0 --dbpath /data/mining --port 27017
$ sudo mongod --replSet rs0 --dbpath /data/mining2 --port 27018
$ sudo mongod --replSet rs0 --dbpath /data/mining3 --port 27019
Then, I configured replication in the Mongo shell and verified that it worked:
> var rsconf = {
_id: "rs0",
members: [
{
_id: 0,
host: 'localhost:27017'
},
{
_id: 1,
host: 'localhost:27018'
},
{
_id: 2,
host: 'localhost:27019'
}
]
};
> rs.initiate(rsconf);
{
"info": "Config now saved locally. Should come online in about a minute.",
"ok": 1
}
// Some time later...
> rs.status()
{
"set": "rs0",
"date": ISODate("2013-06-17T13:23:45-0400"),
"myState": 2,
"syncingTo": "localhost:27017",
"members": [
{
"_id": 0,
"name": "localhost:27017",
"health": 1,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 4582,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"lastHeartbeat": ISODate("2013-06-17T13:23:44-0400"),
"lastHeartbeatRecv": ISODate("2013-06-17T13:23:44-0400"),
"pingMs": 0
},
{
"_id": 1,
"name": "localhost:27018",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 5034,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"self": true
},
{
"_id": 2,
"name": "localhost:27019",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 4582,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"lastHeartbeat": ISODate("2013-06-17T13:23:44-0400"),
"lastHeartbeatRecv": ISODate("2013-06-17T13:23:45-0400"),
"pingMs": 0,
"syncingTo": "localhost:27017"
}
],
"ok": 1
}
My script runs fine against the primary:
$ ./runScripts.sh -h localhost -p 27017
MongoDB shell version: 2.4.3
connecting to: localhost:27017/test
Successful completion
However, against either secondary:
$ ./runScripts.sh -h localhost -p 27018
MongoDB shell version: 2.4.3
connecting to: localhost:27017/test
Mon Jun 17 13:30:22.989 JavaScript execution failed: count failed:
{ "note" : "from execCommand", "ok" : 0, "errmsg" : "not master" }
at src/mongo/shell/query.js:L180
failed to load: /.../.../myAggregateScript.js
I've read in multiple places to use rs.slaveOk() or db.getMongo().setSlaveOk(), but neither of these had any effect, whether entered from the shell or called in my script. These statements did not throw errors when called, but they didn't fix the problem, either.
Does anyone know why I can't configure my replset to allow querying of the secondary?
rs.slaveOk() run in the mongo shell will allow you to read from secondaries. Here is a demonstration using the mongo shell under MongoDB 2.4.3:
$ mongo --port 27017
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27017/test
replset:PRIMARY> db.foo.save({})
replset:PRIMARY> db.foo.find()
{ "_id" : ObjectId("51bf5dbd473d5e80fc095b17") }
replset:PRIMARY> exit
$ mongo --port 27018
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27018/test
replset:SECONDARY> db.foo.find()
error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
replset:SECONDARY> rs.slaveOk()
replset:SECONDARY> db.foo.find()
{ "_id" : ObjectId("51bf5dbd473d5e80fc095b17") }
replset:SECONDARY> db.foo.count()
1
You have to run the command rs.slaveOk() in the secondary server's shell.
Use the following to run queries on your MongoDB secondary:
db.getMongo().setReadPref('secondary')