Mongo secondary automatically going to Recovering state and gets stuck - mongodb

I am using mongodb 2.6.5 and have a 3 node replica set. Many a time i see that the secondary nodes goes of to Recovering state, though i do not try to do any sync or anything. Not sure if mongo is doing it in the backend. The status never comes out of Recovering. I saw many threads which tells how i can bring back from Recovering state to Secondary, but what i am looking out it to find the issue as to why it is going to Recovering state. Is there any configuration that could lead to this..
A few lines of logs (that i feel could be useful) from my secondary server that went to Recovering. 192.168.12.155:5000 is the primary node and 192.168.12.154:5000 is the secondary node. The below log was taken from 192.168.12.154..
2015-03-08T20:02:20.963+0530 [conn223] end connection 192.168.31.152:43503 (4 connections now open)
2015-03-08T20:02:20.965+0530 [initandlisten] connection accepted from 192.168.31.152:43505 #225 (5 connections now open)
2015-03-08T20:02:21.065+0530 [conn224] end connection 192.168.31.152:43504 (4 connections now open)
2015-03-08T20:02:21.076+0530 [initandlisten] connection accepted from 192.168.31.152:43506 #226 (5 connections now open)
2015-03-08T20:02:21.207+0530 [conn225] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|445, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 112ms
2015-03-08T20:02:21.209+0530 [conn225] end connection 192.168.31.152:43505 (4 connections now open)
2015-03-08T20:02:21.211+0530 [initandlisten] connection accepted from 192.168.31.152:43507 #227 (5 connections now open)
2015-03-08T20:02:21.345+0530 [conn227] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 133ms
2015-03-08T20:02:21.441+0530 [conn226] end connection 192.168.31.152:43506 (4 connections now open)
2015-03-08T20:02:21.453+0530 [initandlisten] connection accepted from 192.168.31.152:43508 #228 (5 connections now open)
2015-03-08T20:02:21.586+0530 [conn227] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|448, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 113ms
2015-03-08T20:02:21.588+0530 [conn227] end connection 192.168.31.152:43507 (4 connections now open)
2015-03-08T20:02:21.590+0530 [initandlisten] connection accepted from 192.168.31.152:43509 #229 (5 connections now open)
2015-03-08T20:02:21.707+0530 [conn229] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 116ms
2015-03-08T20:02:21.808+0530 [conn228] end connection 192.168.31.152:43508 (4 connections now open)
2015-03-08T20:02:21.821+0530 [initandlisten] connection accepted from 192.168.31.152:43510 #230 (5 connections now open)
2015-03-08T20:02:21.833+0530 [conn229] end connection 192.168.31.152:43509 (4 connections now open)
2015-03-08T20:02:21.834+0530 [initandlisten] connection accepted from 192.168.31.152:43511 #231 (5 connections now open)
2015-03-08T20:02:22.069+0530 [conn62] end connection 192.168.12.155:42354 (4 connections now open)
2015-03-08T20:02:22.069+0530 [initandlisten] connection accepted from 192.168.12.155:42811 #232 (6 connections now open)
2015-03-08T20:02:22.069+0530 [conn231] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 234ms
2015-03-08T20:02:22.167+0530 [conn230] end connection 192.168.31.152:43510 (4 connections now open)
2015-03-08T20:02:22.177+0530 [initandlisten] connection accepted from 192.168.31.152:43512 #233 (5 connections now open)
2015-03-08T20:02:22.315+0530 [conn231] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|454, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 116ms
2015-03-08T20:02:22.317+0530 [conn231] end connection 192.168.31.152:43511 (4 connections now open)
2015-03-08T20:02:22.319+0530 [initandlisten] connection accepted from 192.168.31.152:43513 #234 (5 connections now open)
2015-03-08T20:02:22.432+0530 [conn234] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 112ms
2015-03-08T20:02:22.529+0530 [conn233] end connection 192.168.31.152:43512 (4 connections now open)
2015-03-08T20:02:22.540+0530 [initandlisten] connection accepted from 192.168.31.152:43514 #235 (5 connections now open)
2015-03-08T20:02:22.681+0530 [conn234] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|457, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 118ms
2015-03-08T20:02:22.682+0530 [conn234] end connection 192.168.31.152:43513 (4 connections now open)
2015-03-08T20:02:22.684+0530 [initandlisten] connection accepted from 192.168.31.152:43515 #236 (5 connections now open)
2015-03-08T20:02:22.794+0530 [conn236] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 108ms
2015-03-08T20:02:22.891+0530 [conn235] end connection 192.168.31.152:43514 (4 connections now open)
2015-03-08T20:02:22.902+0530 [initandlisten] connection accepted from 192.168.31.152:43516 #237 (5 connections now open)
2015-03-08T20:02:22.927+0530 [conn236] end connection 192.168.31.152:43515 (4 connections now open)
2015-03-08T20:02:22.929+0530 [initandlisten] connection accepted from 192.168.31.152:43517 #238 (5 connections now open)
2015-03-08T20:02:23.156+0530 [conn238] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 226ms
2015-03-08T20:02:23.252+0530 [conn237] end connection 192.168.31.152:43516 (4 connections now open)
2015-03-08T20:02:23.263+0530 [initandlisten] connection accepted from 192.168.31.152:43518 #239 (5 connections now open)
2015-03-08T20:02:23.414+0530 [conn238] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|463, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 122ms
2015-03-08T20:02:23.415+0530 [conn238] end connection 192.168.31.152:43517 (4 connections now open)
2015-03-08T20:02:23.417+0530 [initandlisten] connection accepted from 192.168.31.152:43519 #240 (5 connections now open)
2015-03-08T20:02:23.619+0530 [conn239] end connection 192.168.31.152:43518 (4 connections now open)
2015-03-08T20:02:23.628+0530 [initandlisten] connection accepted from 192.168.31.152:43520 #241 (5 connections now open)
2015-03-08T20:02:23.781+0530 [conn240] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|466, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 124ms
2015-03-08T20:02:23.782+0530 [conn240] end connection 192.168.31.152:43519 (4 connections now open)
2015-03-08T20:02:23.784+0530 [initandlisten] connection accepted from 192.168.31.152:43521 #242 (5 connections now open)
2015-03-08T20:02:23.979+0530 [conn241] end connection 192.168.31.152:43520 (4 connections now open)
2015-03-08T20:02:23.986+0530 [initandlisten] connection accepted from 192.168.31.152:43522 #243 (5 connections now open)
2015-03-08T20:02:24.148+0530 [conn242] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|469, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 127ms
2015-03-08T20:02:24.149+0530 [conn242] end connection 192.168.31.152:43521 (4 connections now open)
2015-03-08T20:02:24.152+0530 [initandlisten] connection accepted from 192.168.31.152:43523 #244 (5 connections now open)
2015-03-08T20:02:24.341+0530 [conn243] end connection 192.168.31.152:43522 (4 connections now open)
2015-03-08T20:02:24.350+0530 [initandlisten] connection accepted from 192.168.31.152:43524 #245 (5 connections now open)
2015-03-08T20:02:24.504+0530 [conn244] end connection 192.168.31.152:43523 (4 connections now open)
2015-03-08T20:02:24.506+0530 [initandlisten] connection accepted from 192.168.31.152:43525 #246 (5 connections now open)
2015-03-08T20:02:24.707+0530 [conn245] end connection 192.168.31.152:43524 (4 connections now open)
2015-03-08T20:02:24.714+0530 [initandlisten] connection accepted from 192.168.31.152:43526 #247 (5 connections now open)
2015-03-08T20:02:24.899+0530 [conn246] end connection 192.168.31.152:43525 (4 connections now open)
2015-03-08T20:02:24.901+0530 [initandlisten] connection accepted from 192.168.31.152:43527 #248 (5 connections now open)
2015-03-08T20:02:24.969+0530 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
2015-03-08T20:02:25.051+0530 [rsBackgroundSync] replSet syncing to: 192.168.12.155:5000
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet not trying to sync from 192.168.12.155:5000, it is vetoed for 600 more seconds
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet not trying to sync from 192.168.12.155:5000, it is vetoed for 600 more seconds
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from 192.168.12.155:5000
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet our last optime : Mar 8 20:00:53 54fc5d1d:3c
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet oldest at 192.168.12.155:5000 : Mar 8 20:00:54 54fc5d1e:5d
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet RECOVERING
2015-03-08T20:02:25.137+0530 [conn247] end connection 192.168.31.152:43526 (4 connections now open)
2015-03-08T20:02:25.146+0530 [initandlisten] connection accepted from 192.168.31.152:43528 #249 (5 connections now open)
2015-03-08T20:02:25.147+0530 [conn248] end connection 192.168.31.152:43527 (4 connections now open)
2015-03-08T20:02:25.148+0530 [initandlisten] connection accepted from 192.168.31.152:43529 #250 (5 connections now open)
2015-03-08T20:02:25.245+0530 [conn249] end connection 192.168.31.152:43528 (4 connections now open)
2015-03-08T20:02:25.253+0530 [initandlisten] connection accepted from 192.168.31.152:43530 #251 (5 connections now open)
2015-03-08T20:02:25.254+0530 [conn250] end connection 192.168.31.152:43529 (4 connections now open)
2015-03-08T20:02:25.254+0530 [initandlisten] connection accepted from 192.168.31.152:43531 #252 (5 connections now open)
2015-03-08T20:02:25.349+0530 [conn251] end connection 192.168.31.152:43530 (4 connections now open)
2015-03-08T20:02:25.356+0530 [initandlisten] connection accepted from 192.168.31.152:43532 #253 (5 connections now open)
2015-03-08T20:02:25.357+0530 [conn252] end connection 192.168.31.152:43531 (4 connections now open)
2015-03-08T20:02:25.357+0530 [initandlisten] connection accepted from 192.168.31.152:43533 #254 (5 connections now open)

Mongodb documentation has a section on this
https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-auto-resync-stale-member

Related

Converting a standalone MongoDB instance to a single-node replica set

I am trying to convert my standalone MongoDB instance to a single-node replica set, for the purpose of live migrating to Atlas.
I followed this procedure: https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/
The step I took were:
$sudo service mongodb stop
$sudo service mongod start
$mongo
>rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "staging3.domain.io:27017",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
singleNodeRepl:PRIMARY> rs.status()
{
"set" : "singleNodeRepl",
"date" : ISODate("2020-11-26T00:46:25Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "staging4.domain.io:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1197,
"optime" : Timestamp(1606350415, 1),
"optimeDate" : ISODate("2020-11-26T00:26:55Z"),
"electionTime" : Timestamp(1606350415, 2),
"electionDate" : ISODate("2020-11-26T00:26:55Z"),
"self" : true
}
],
"ok" : 1
}
singleNodeRepl:PRIMARY> db.oplog.rs.find()
{ "ts" : Timestamp(1606350415, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
At this point, it seems to have no issues.
However, my application is not being able to work as it did before.
Would really appreciate any help in troubleshooting the issue.
Thank you.
EDIT:
As suggested I included replSet in the config file instead passing it as an argument.
This is my config file:
# mongod.conf
#where to log
logpath=/var/log/mongodb/mongod.log
logappend=true
# fork and run in background
fork=true
#port=27017
dbpath=/var/lib/mongo
# location of pidfile
pidfilepath=/var/run/mongodb/mongod.pid
# Listen to local interface only. Comment out to listen on all interfaces.
#bind_ip=127.0.0.1
# Disables write-ahead journaling
# nojournal=true
# Enables periodic logging of CPU utilization and I/O wait
#cpu=true
# Turn on/off security. Off is currently the default
#noauth=true
#auth=true
# Verbose logging output.
verbose=true
# Inspect all client data for validity on receipt (useful for
# developing drivers)
#objcheck=true
# Enable db quota management
#quota=true
# Set oplogging level where n is
# 0=off (default)
# 1=W
# 2=R
# 3=both
# 7=W+some reads
#diaglog=0
# Ignore query hints
#nohints=true
# Enable the HTTP interface (Defaults to port 28017).
#httpinterface=true
# Turns off server-side scripting. This will result in greatly limited
# functionality
#noscripting=true
# Turns off table scans. Any query that would do a table scan fails.
#notablescan=true
# Disable data file preallocation.
#noprealloc=true
# Specify .ns file size for new databases.
# nssize=<size>
# Replication Options
# in replicated mongo databases, specify the replica set name here
replSet=singleNodeRepl
# maximum size in megabytes for replication operation log
#oplogSize=1024
# path to a key file storing authentication info for connections
# between replica set members
#keyFile=/path/to/keyfile
And verbose log file:
It does look like everything is working fine. However, my application is not able to connect to the DB as it did.
2020-11-26T00:26:55.852+0000 [conn1] replSet replSetInitiate admin command received from client
2020-11-26T00:26:55.853+0000 [conn1] replSet info initiate : no configuration specified. Using a default configuration for the set
2020-11-26T00:26:55.853+0000 [conn1] replSet created this configuration for initiation : { _id: "singleNodeRepl", members: [ { _id: 0, host: "staging4.domain.io:27017" } ] }
2020-11-26T00:26:55.853+0000 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
2020-11-26T00:26:55.853+0000 [conn1] getMyAddrs(): [127.0.0.1] [10.20.26.228] [::1] [fe80::8ed:65ff:fe9e:15ab%eth0]
2020-11-26T00:26:55.853+0000 [conn1] getallIPs("staging4.domain.io"): [127.0.0.1]
2020-11-26T00:26:55.853+0000 [conn1] replSet replSetInitiate all members seem up
2020-11-26T00:26:55.853+0000 [conn1] ******
2020-11-26T00:26:55.853+0000 [conn1] creating replication oplog of size: 2570MB...
2020-11-26T00:26:55.853+0000 [conn1] create collection local.oplog.rs { size: 2695574937.6, capped: true, autoIndexId: false }
2020-11-26T00:26:55.853+0000 [conn1] Database::_addNamespaceToCatalog ns: local.oplog.rs
2020-11-26T00:26:55.866+0000 [conn1] ExtentManager::increaseStorageSize ns:local.oplog.rs desiredSize:2146426624 fromFreeList: 0 eloc: 1:2000
2020-11-26T00:26:55.876+0000 [conn1] ExtentManager::increaseStorageSize ns:local.oplog.rs desiredSize:549148160 fromFreeList: 0 eloc: 2:2000
2020-11-26T00:26:55.878+0000 [conn1] ******
2020-11-26T00:26:55.878+0000 [conn1] replSet info saving a newer config version to local.system.replset: { _id: "singleNodeRepl", version: 1, members: [ { _id: 0, host: "staging4.domain.io:27017" } ] }
2020-11-26T00:26:55.878+0000 [conn1] Database::_addNamespaceToCatalog ns: local.system.replset
2020-11-26T00:26:55.878+0000 [conn1] ExtentManager::increaseStorageSize ns:local.system.replset desiredSize:8192 fromFreeList: 0 eloc: 2:20bb8000
2020-11-26T00:26:55.878+0000 [conn1] Database::_addNamespaceToCatalog ns: local.system.replset.$_id_
2020-11-26T00:26:55.878+0000 [conn1] build index on: local.system.replset properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.system.replset" }
2020-11-26T00:26:55.878+0000 [conn1] local.system.replset: clearing plan cache - collection info cache reset
2020-11-26T00:26:55.878+0000 [conn1] allocating new extent
2020-11-26T00:26:55.878+0000 [conn1] ExtentManager::increaseStorageSize ns:local.system.replset.$_id_ desiredSize:131072 fromFreeList: 0 eloc: 2:20bba000
2020-11-26T00:26:55.878+0000 [conn1] added index to empty collection
2020-11-26T00:26:55.878+0000 [conn1] local.system.replset: clearing plan cache - collection info cache reset
2020-11-26T00:26:55.878+0000 [conn1] replSet saveConfigLocally done
2020-11-26T00:26:55.878+0000 [conn1] replSet replSetInitiate config now saved locally. Should come online in about a minute.
2020-11-26T00:26:55.878+0000 [conn1] command admin.$cmd command: replSetInitiate { replSetInitiate: undefined } keyUpdates:0 numYields:0 locks(micros) W:25362 reslen:206 25ms
2020-11-26T00:26:55.879+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:270 0ms
2020-11-26T00:27:01.256+0000 [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:27:01.257+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:27:10.688+0000 [conn1] query local.system.replset planSummary: COLLSCAN ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:97 nreturned:1 reslen:126 0ms
2020-11-26T00:27:10.689+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:27:28.889+0000 [clientcursormon] connections:1
2020-11-26T00:27:33.333+0000 [conn1] end connection 127.0.0.1:50580 (0 connections now open)
2020-11-26T00:27:57.230+0000 [initandlisten] connection accepted from 127.0.0.1:50582 #2 (1 connection now open)
2020-11-26T00:27:57.230+0000 [conn2] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:27:57.232+0000 [conn2] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:27:57.233+0000 [conn2] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:28:00.237+0000 [conn2] command admin.$cmd command: serverStatus { serverStatus: 1.0 } keyUpdates:0 numYields:0 locks(micros) r:13 reslen:3402 0ms
2020-11-26T00:28:00.242+0000 [conn2] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:28:16.560+0000 [conn2] end connection 127.0.0.1:50582 (0 connections now open)
2020-11-26T00:32:28.904+0000 [clientcursormon] connections:0
2020-11-26T00:36:32.398+0000 [initandlisten] connection accepted from 127.0.0.1:50588 #3 (1 connection now open)
2020-11-26T00:36:32.398+0000 [conn3] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:36:32.399+0000 [conn3] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:36:32.400+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:36:34.603+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:36:37.326+0000 [conn3] query local.oplog.rs planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:66 nreturned:1 reslen:106 0ms
2020-11-26T00:36:37.328+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:37:28.832+0000 [initandlisten] connection accepted from 10.20.37.160:54484 #4 (2 connections now open)
2020-11-26T00:37:28.832+0000 [conn4] command admin.$cmd command: isMaster { isMaster: 1, compression: [], client: { driver: { name: "mongo-ruby-driver", version: "2.13.1" }, os: { type: "linux", name: "linux-gnu", architecture: "x86_64" }, platform: "mongoid-6.4.1, Ruby 2.6.5, x86_64-linux, x86_64-pc-linux-gnu" } } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:37:28.919+0000 [clientcursormon] connections:2
2020-11-26T00:37:33.568+0000 [initandlisten] connection accepted from 10.20.37.160:54492 #5 (3 connections now open)
2020-11-26T00:37:33.569+0000 [conn5] command admin.$cmd command: isMaster { isMaster: 1, compression: [], client: { driver: { name: "mongo-ruby-driver", version: "2.13.1" }, os: { type: "linux", name: "linux-gnu", architecture: "x86_64" }, platform: "mongoid-6.4.1, Ruby 2.6.5, x86_64-linux, x86_64-pc-linux-gnu" } } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:37:36.586+0000 [conn3] end connection 127.0.0.1:50588 (2 connections now open)
2020-11-26T00:39:35.621+0000 [initandlisten] connection accepted from 127.0.0.1:50592 #6 (3 connections now open)
2020-11-26T00:39:35.621+0000 [conn6] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:39:35.622+0000 [conn6] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:39:35.623+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:37.589+0000 [conn6] opening db: test
2020-11-26T00:39:37.589+0000 [conn6] query test.oplog.rs planSummary: EOF ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) W:186 r:19 nreturned:0 reslen:20 0ms
2020-11-26T00:39:37.590+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:41.891+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:43.266+0000 [conn6] query local.oplog.rs planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:62 nreturned:1 reslen:106 0ms
2020-11-26T00:39:43.268+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:52.681+0000 [conn6] end connection 127.0.0.1:50592 (2 connections now open)
2020-11-26T00:42:28.934+0000 [clientcursormon] connections:2
You should not mix using a config file, i.e.
mongod --config /etc/mongod.conf
and command line options
mongod --replSet rs0 --bind_ip localhost
Most likely in your config you did not set in /etc/mongod.conf
replication:
replSetName: <string>
So, when you start your MongoDB with service mongodb start then you may have a different configuration.
Note, check the service file (in my Redhat at /etc/systemd/system/mongod.service) which may point even to a different .conf file.

Cannot initiate mongodb (code 8)

Tried to initiate a replica set in MongoDB but failed.
My mongod configuration file is as follows:
dbpath=C:\data\db
logpath=C:\data\log\mongo.log
storageEngine=mmapv1
After starting mongod with the command:
mongod --config "C:\data\mongo.conf" --replSet "rs0"
I went to mongo and typed
rs.initiate()
and got the error of "no configuration file specified" (code 8). Also tried to clearly instruct mongodb using
cfg = {"_id": "rs0", "version":1, "members":[{"_id":0,"host":"127.0.0.1:27017"}]}
rs.initiate(cfg)
However, the result is still the same (code 8).
Dig deeper into the log file, I found this
replSetInitiate failed to store config document or create the oplog; UnknownError: assertion C:\data\mci\7751c6064ad5f370b9aea0db0164a05e\src\src\mongo/util/concurrency/rwlock.h:204
2017-08-26T18:36:41.760+0700 I COMMAND [conn1] command local.oplog.rs command: replSetInitiate { replSetInitiate: { _id: "rs0", version: 1.0, members: [ { _id: 0.0, host: "127.0.0.1:27017" } ] } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:143 locks:{ Global: { acquireCount: { r: 1, W: 1 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Metadata: { acquireCount: { W: 6 } } } protocol:op_command 4782ms
Any hint for me please? Thank you a ton.

Mongo Auto Balancing Not Working

I'm running into an issue where one of my shards is constantly at 100% CPU usage while I'm storing files into my Mongo DB (using Grid FS). I have shutdown writing to the DB and the usage does drop down to nearly 0%. However, the auto balancer is on and does not appear to be auto balancing anything. I have roughly 50% of my data on that one shard with nearly 100% CPU usage and virtually all the others are at 7-8%.
Any ideas?
mongos> version()
3.0.6
Auto Balancing Enabled
Storage Engine: WiredTiger
I have this general architecture:
2 - routers
3 - config server
8 - shards (2 shards per server - 4 servers)
No replica sets!
https://docs.mongodb.org/v3.0/core/sharded-cluster-architectures-production/
Log Details
Router 1 Log:
2016-01-15T16:15:21.714-0700 I NETWORK [conn3925104] end connection [IP]:[port] (63 connections now open)
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] Socket recv() timeout [IP]:[port]
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] SocketException: remote: [IP]:[port] error: 9001 socket exception [RECV_TIMEOUT] server [IP]:[port]
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] DBClientCursor::init call() failed
2016-01-15T16:15:23.256-0700 I NETWORK [LockPinger] scoped connection to [IP]:[port],[IP]:[port],[IP]:[port] not being returned to the pool
2016-01-15T16:15:23.256-0700 W SHARDING [LockPinger] distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579303:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection::update prepare failed: [IP]:[port] (IP) failed:10276 DBClientBase::findN: transport error: [IP]:[port] ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
2016-01-15T16:15:24.715-0700 I NETWORK [mongosMain] connection accepted from [IP]:[port] #3925105 (64 connections now open)
2016-01-15T16:15:24.715-0700 I NETWORK [conn3925105] end connection [IP]:[port] (63 connections now open)
2016-01-15T16:15:27.717-0700 I NETWORK [mongosMain] connection accepted from [IP]:[port] #3925106 (64 connections now open)
2016-01-15T16:15:27.718-0700 I NETWORK [conn3925106] end connection [IP]:[port](63 connections now open)
Router 2 Log:
2016-01-15T16:18:21.762-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e3d110ccb8e38549a9d
2016-01-15T16:18:24.316-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:18:24 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579454:1804289383', sleeping for 30000ms
2016-01-15T16:18:24.978-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:35.295-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e4a110ccb8e38549a9f
2016-01-15T16:18:38.507-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:48.838-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e58110ccb8e38549aa1
2016-01-15T16:18:52.038-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
2016-01-15T16:18:54.660-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:18:54 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1442579454:1804289383', sleeping for 30000ms
2016-01-15T16:19:02.323-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' acquired, ts : 56997e66110ccb8e38549aa3
2016-01-15T16:19:05.513-0700 I SHARDING [Balancer] distributed lock 'balancer/[IP]:[port]:1442579454:1804289383' unlocked.
Problematic Shard Log:
2016-01-15T16:21:03.426-0700 W SHARDING [conn40] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 46 took 17364ms
2016-01-15T16:21:03.484-0700 I COMMAND [conn40] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 17421ms
2016-01-15T16:21:03.775-0700 I SHARDING [LockPinger] cluster [IP]:[port],[IP]:[port],[IP]:[port] pinged successfully at Fri Jan 15 16:21:03 2016 by distributed lock pinger '[IP]:[port],[IP]:[port],[IP]:[port]/[IP]:[port]:1441718306:765353801', sleeping for 30000ms
2016-01-15T16:21:04.321-0700 I SHARDING [conn40] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
2016-01-15T16:21:08.243-0700 I SHARDING [conn46] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
2016-01-15T16:21:10.174-0700 W SHARDING [conn37] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 60 took 18516ms
2016-01-15T16:21:10.232-0700 I COMMAND [conn37] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 18574ms
2016-01-15T16:21:10.989-0700 W SHARDING [conn25] Finding the split vector for Files.fs.chunks over { files_id: 1.0, n: 1.0 } keyCount: 137 numSplits: 200715 lookedAt: 62 took 18187ms
2016-01-15T16:21:11.047-0700 I COMMAND [conn25] command admin.$cmd command: splitVector { splitVector: "Files.fs.chunks", keyPattern: { files_id: 1.0, n: 1.0 }, min: { files_id: ObjectId('5650816c827928d710ef5ef9'), n: 1 }, max: { files_id: MaxKey, n: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 250000 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:216396 reslen:8318989 locks:{ Global: { acquireCount: { r: 432794 } }, Database: { acquireCount: { r: 216397 } }, Collection: { acquireCount: { r: 216397 } } } 18246ms
2016-01-15T16:21:11.365-0700 I SHARDING [conn37] request split points lookup for chunk Files.fs.chunks { : ObjectId('5650816c827928d710ef5ef9'), : 1 } -->> { : MaxKey, : MaxKey }
For the splitting error - Upgrading to Mongo v.3.0.8+ resolved it
Still having an issue with the balancing itself...shard key is an md5 check sum so unless they all have very similar md5s (not very likely) there is still investigating to do....using range based partitioning
there are multiple ways to check
db.printShardingStatus() - this will give all collections sharded and whether auto balancer is on and current collection taken for sharding from when
sh.status(true) - this will give chunk level details. Look whether your chunk has jumbo:true . In case chunk is marked as jumbo it will not be split properly.
db.collection.stats() -- this will give collection stats and see each shard distribution details there

how to enable sharding in test environment

How do I enable sharding in test environment? Here I am sharing what i have done till now, I have one config server:
Config server1: Host-a:27019
One mongos instance on same machine on port 27017
and two mongod instance shard:
Host-a:27020
host-b:27021
When I am enabling sharding on collection it gives me this error:
2016-01-12T10:31:07.522Z I SHARDING [Balancer] ns: feedproductsdata.merchantproducts going to move { _id: "feedproductsdata.merchantproducts-product_id_MinKey", ns: "feedproductsdata.merchantproducts", min: { product_id: MinKey }, max: { product_id: 0 }, version: Timestamp 1000|0, versionEpoch: ObjectId('5694d57ebe78315b68519c38'), lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('5694d57ebe78315b68519c38'), shard: "shard0001" } from: shard0001 to: shard0000 tag []
2016-01-12T10:31:07.523Z I SHARDING [Balancer] moving chunk ns: feedproductsdata.merchantproducts moving ( ns: feedproductsdata.merchantproducts, shard: shard0001:192.168.1.12:27021, lastmod: 1|0||000000000000000000000000, min: { product_id: MinKey }, max: { product_id: 0 }) shard0001:192.168.1.12:27021 -> shard0000:192.168.1.8:27020
2016-01-12T10:31:08.530Z I SHARDING [Balancer] moveChunk result: { errmsg: "exception: socket exception [CONNECT_ERROR] for cfg1.server.com:27019", code: 11002, ok: 0.0 }
2016-01-12T10:31:08.531Z I SHARDING [Balancer] balancer move failed: { errmsg: "exception: socket exception [CONNECT_ERROR] for cfg1.server.com:27019", code: 11002, ok: 0.0 } from: shard0001 to: shard0000 chunk: min: { product_id: MinKey } max: { product_id: 0 }
2016-01-12T10:31:08.604Z I SHARDING [Balancer] distributed lock 'balancer/Knowledgeops-PC:27017:1452594322:41' unlocked.

Mongodb crashed with Got signal: 11 (Segmentation fault)

My mongo server crashed with the following log. My mongo server is of version 2.4.2 and my mongo java client is 2.11.2. My environment is RHEL
Please let know as to what could eb the problem.. I see from other threads that the older version before 2.2.x had this problem, but mine is 2.4.2. Any help..
...
Thu Feb 20 13:45:56.924 [conn78956] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.924 [conn78956] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78962] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78962] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78962] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78965] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78965] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78965] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78964] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78964] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78964] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.941 [rsHealthPoll] replSet member 204.27.36.236:5000 is up
Thu Feb 20 13:45:56.941 [rsHealthPoll] replSet member 204.27.36.236:5000 is now in state SECONDARY
Thu Feb 20 13:45:56.941 [rsMgr] replSet warning caught unexpected exception in electSelf()
Thu Feb 20 13:45:56.941 Invalid access at address: 0 from thread:
Thu Feb 20 13:45:56.941 Got signal: 11 (Segmentation fault).
Thu Feb 20 13:45:56.941 [conn78959] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.941 [conn78959] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.941 [conn78959] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.943 Backtrace:
0xdced21 0x6cf749 0x6cfcd2 0x30e160f4c0
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdced21]
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cf749]
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6cfcd2]
/lib64/libpthread.so.0() [0x30e160f4c0]