Mongodb crashed with Got signal: 11 (Segmentation fault) - mongodb

My mongo server crashed with the following log. My mongo server is of version 2.4.2 and my mongo java client is 2.11.2. My environment is RHEL
Please let know as to what could eb the problem.. I see from other threads that the older version before 2.2.x had this problem, but mine is 2.4.2. Any help..
...
Thu Feb 20 13:45:56.924 [conn78956] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.924 [conn78956] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78962] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78962] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78962] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78965] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78965] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78965] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.938 [conn78964] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78964] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.938 [conn78964] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.941 [rsHealthPoll] replSet member 204.27.36.236:5000 is up
Thu Feb 20 13:45:56.941 [rsHealthPoll] replSet member 204.27.36.236:5000 is now in state SECONDARY
Thu Feb 20 13:45:56.941 [rsMgr] replSet warning caught unexpected exception in electSelf()
Thu Feb 20 13:45:56.941 Invalid access at address: 0 from thread:
Thu Feb 20 13:45:56.941 Got signal: 11 (Segmentation fault).
Thu Feb 20 13:45:56.941 [conn78959] runQuery called admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.941 [conn78959] run command admin.$cmd { ismaster: 1 }
Thu Feb 20 13:45:56.941 [conn78959] command admin.$cmd command: { ismaster: 1 } ntoreturn:1 keyUpdates:0 reslen:263 0ms
Thu Feb 20 13:45:56.943 Backtrace:
0xdced21 0x6cf749 0x6cfcd2 0x30e160f4c0
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdced21]
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6cf749]
/home/myserver/mySer/db/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6cfcd2]
/lib64/libpthread.so.0() [0x30e160f4c0]

Related

Converting a standalone MongoDB instance to a single-node replica set

I am trying to convert my standalone MongoDB instance to a single-node replica set, for the purpose of live migrating to Atlas.
I followed this procedure: https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/
The step I took were:
$sudo service mongodb stop
$sudo service mongod start
$mongo
>rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "staging3.domain.io:27017",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
singleNodeRepl:PRIMARY> rs.status()
{
"set" : "singleNodeRepl",
"date" : ISODate("2020-11-26T00:46:25Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "staging4.domain.io:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1197,
"optime" : Timestamp(1606350415, 1),
"optimeDate" : ISODate("2020-11-26T00:26:55Z"),
"electionTime" : Timestamp(1606350415, 2),
"electionDate" : ISODate("2020-11-26T00:26:55Z"),
"self" : true
}
],
"ok" : 1
}
singleNodeRepl:PRIMARY> db.oplog.rs.find()
{ "ts" : Timestamp(1606350415, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
At this point, it seems to have no issues.
However, my application is not being able to work as it did before.
Would really appreciate any help in troubleshooting the issue.
Thank you.
EDIT:
As suggested I included replSet in the config file instead passing it as an argument.
This is my config file:
# mongod.conf
#where to log
logpath=/var/log/mongodb/mongod.log
logappend=true
# fork and run in background
fork=true
#port=27017
dbpath=/var/lib/mongo
# location of pidfile
pidfilepath=/var/run/mongodb/mongod.pid
# Listen to local interface only. Comment out to listen on all interfaces.
#bind_ip=127.0.0.1
# Disables write-ahead journaling
# nojournal=true
# Enables periodic logging of CPU utilization and I/O wait
#cpu=true
# Turn on/off security. Off is currently the default
#noauth=true
#auth=true
# Verbose logging output.
verbose=true
# Inspect all client data for validity on receipt (useful for
# developing drivers)
#objcheck=true
# Enable db quota management
#quota=true
# Set oplogging level where n is
# 0=off (default)
# 1=W
# 2=R
# 3=both
# 7=W+some reads
#diaglog=0
# Ignore query hints
#nohints=true
# Enable the HTTP interface (Defaults to port 28017).
#httpinterface=true
# Turns off server-side scripting. This will result in greatly limited
# functionality
#noscripting=true
# Turns off table scans. Any query that would do a table scan fails.
#notablescan=true
# Disable data file preallocation.
#noprealloc=true
# Specify .ns file size for new databases.
# nssize=<size>
# Replication Options
# in replicated mongo databases, specify the replica set name here
replSet=singleNodeRepl
# maximum size in megabytes for replication operation log
#oplogSize=1024
# path to a key file storing authentication info for connections
# between replica set members
#keyFile=/path/to/keyfile
And verbose log file:
It does look like everything is working fine. However, my application is not able to connect to the DB as it did.
2020-11-26T00:26:55.852+0000 [conn1] replSet replSetInitiate admin command received from client
2020-11-26T00:26:55.853+0000 [conn1] replSet info initiate : no configuration specified. Using a default configuration for the set
2020-11-26T00:26:55.853+0000 [conn1] replSet created this configuration for initiation : { _id: "singleNodeRepl", members: [ { _id: 0, host: "staging4.domain.io:27017" } ] }
2020-11-26T00:26:55.853+0000 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
2020-11-26T00:26:55.853+0000 [conn1] getMyAddrs(): [127.0.0.1] [10.20.26.228] [::1] [fe80::8ed:65ff:fe9e:15ab%eth0]
2020-11-26T00:26:55.853+0000 [conn1] getallIPs("staging4.domain.io"): [127.0.0.1]
2020-11-26T00:26:55.853+0000 [conn1] replSet replSetInitiate all members seem up
2020-11-26T00:26:55.853+0000 [conn1] ******
2020-11-26T00:26:55.853+0000 [conn1] creating replication oplog of size: 2570MB...
2020-11-26T00:26:55.853+0000 [conn1] create collection local.oplog.rs { size: 2695574937.6, capped: true, autoIndexId: false }
2020-11-26T00:26:55.853+0000 [conn1] Database::_addNamespaceToCatalog ns: local.oplog.rs
2020-11-26T00:26:55.866+0000 [conn1] ExtentManager::increaseStorageSize ns:local.oplog.rs desiredSize:2146426624 fromFreeList: 0 eloc: 1:2000
2020-11-26T00:26:55.876+0000 [conn1] ExtentManager::increaseStorageSize ns:local.oplog.rs desiredSize:549148160 fromFreeList: 0 eloc: 2:2000
2020-11-26T00:26:55.878+0000 [conn1] ******
2020-11-26T00:26:55.878+0000 [conn1] replSet info saving a newer config version to local.system.replset: { _id: "singleNodeRepl", version: 1, members: [ { _id: 0, host: "staging4.domain.io:27017" } ] }
2020-11-26T00:26:55.878+0000 [conn1] Database::_addNamespaceToCatalog ns: local.system.replset
2020-11-26T00:26:55.878+0000 [conn1] ExtentManager::increaseStorageSize ns:local.system.replset desiredSize:8192 fromFreeList: 0 eloc: 2:20bb8000
2020-11-26T00:26:55.878+0000 [conn1] Database::_addNamespaceToCatalog ns: local.system.replset.$_id_
2020-11-26T00:26:55.878+0000 [conn1] build index on: local.system.replset properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.system.replset" }
2020-11-26T00:26:55.878+0000 [conn1] local.system.replset: clearing plan cache - collection info cache reset
2020-11-26T00:26:55.878+0000 [conn1] allocating new extent
2020-11-26T00:26:55.878+0000 [conn1] ExtentManager::increaseStorageSize ns:local.system.replset.$_id_ desiredSize:131072 fromFreeList: 0 eloc: 2:20bba000
2020-11-26T00:26:55.878+0000 [conn1] added index to empty collection
2020-11-26T00:26:55.878+0000 [conn1] local.system.replset: clearing plan cache - collection info cache reset
2020-11-26T00:26:55.878+0000 [conn1] replSet saveConfigLocally done
2020-11-26T00:26:55.878+0000 [conn1] replSet replSetInitiate config now saved locally. Should come online in about a minute.
2020-11-26T00:26:55.878+0000 [conn1] command admin.$cmd command: replSetInitiate { replSetInitiate: undefined } keyUpdates:0 numYields:0 locks(micros) W:25362 reslen:206 25ms
2020-11-26T00:26:55.879+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:270 0ms
2020-11-26T00:27:01.256+0000 [conn1] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:27:01.257+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:27:10.688+0000 [conn1] query local.system.replset planSummary: COLLSCAN ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:97 nreturned:1 reslen:126 0ms
2020-11-26T00:27:10.689+0000 [conn1] command test.$cmd command: isMaster { isMaster: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:27:28.889+0000 [clientcursormon] connections:1
2020-11-26T00:27:33.333+0000 [conn1] end connection 127.0.0.1:50580 (0 connections now open)
2020-11-26T00:27:57.230+0000 [initandlisten] connection accepted from 127.0.0.1:50582 #2 (1 connection now open)
2020-11-26T00:27:57.230+0000 [conn2] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:27:57.232+0000 [conn2] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:27:57.233+0000 [conn2] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:28:00.237+0000 [conn2] command admin.$cmd command: serverStatus { serverStatus: 1.0 } keyUpdates:0 numYields:0 locks(micros) r:13 reslen:3402 0ms
2020-11-26T00:28:00.242+0000 [conn2] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:28:16.560+0000 [conn2] end connection 127.0.0.1:50582 (0 connections now open)
2020-11-26T00:32:28.904+0000 [clientcursormon] connections:0
2020-11-26T00:36:32.398+0000 [initandlisten] connection accepted from 127.0.0.1:50588 #3 (1 connection now open)
2020-11-26T00:36:32.398+0000 [conn3] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:36:32.399+0000 [conn3] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:36:32.400+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:36:34.603+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:36:37.326+0000 [conn3] query local.oplog.rs planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:66 nreturned:1 reslen:106 0ms
2020-11-26T00:36:37.328+0000 [conn3] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:37:28.832+0000 [initandlisten] connection accepted from 10.20.37.160:54484 #4 (2 connections now open)
2020-11-26T00:37:28.832+0000 [conn4] command admin.$cmd command: isMaster { isMaster: 1, compression: [], client: { driver: { name: "mongo-ruby-driver", version: "2.13.1" }, os: { type: "linux", name: "linux-gnu", architecture: "x86_64" }, platform: "mongoid-6.4.1, Ruby 2.6.5, x86_64-linux, x86_64-pc-linux-gnu" } } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:37:28.919+0000 [clientcursormon] connections:2
2020-11-26T00:37:33.568+0000 [initandlisten] connection accepted from 10.20.37.160:54492 #5 (3 connections now open)
2020-11-26T00:37:33.569+0000 [conn5] command admin.$cmd command: isMaster { isMaster: 1, compression: [], client: { driver: { name: "mongo-ruby-driver", version: "2.13.1" }, os: { type: "linux", name: "linux-gnu", architecture: "x86_64" }, platform: "mongoid-6.4.1, Ruby 2.6.5, x86_64-linux, x86_64-pc-linux-gnu" } } keyUpdates:0 numYields:0 reslen:367 0ms
2020-11-26T00:37:36.586+0000 [conn3] end connection 127.0.0.1:50588 (2 connections now open)
2020-11-26T00:39:35.621+0000 [initandlisten] connection accepted from 127.0.0.1:50592 #6 (3 connections now open)
2020-11-26T00:39:35.621+0000 [conn6] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2020-11-26T00:39:35.622+0000 [conn6] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 numYields:0 reslen:70 0ms
2020-11-26T00:39:35.623+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:37.589+0000 [conn6] opening db: test
2020-11-26T00:39:37.589+0000 [conn6] query test.oplog.rs planSummary: EOF ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) W:186 r:19 nreturned:0 reslen:20 0ms
2020-11-26T00:39:37.590+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:41.891+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:43.266+0000 [conn6] query local.oplog.rs planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:62 nreturned:1 reslen:106 0ms
2020-11-26T00:39:43.268+0000 [conn6] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 numYields:0 reslen:300 0ms
2020-11-26T00:39:52.681+0000 [conn6] end connection 127.0.0.1:50592 (2 connections now open)
2020-11-26T00:42:28.934+0000 [clientcursormon] connections:2
You should not mix using a config file, i.e.
mongod --config /etc/mongod.conf
and command line options
mongod --replSet rs0 --bind_ip localhost
Most likely in your config you did not set in /etc/mongod.conf
replication:
replSetName: <string>
So, when you start your MongoDB with service mongodb start then you may have a different configuration.
Note, check the service file (in my Redhat at /etc/systemd/system/mongod.service) which may point even to a different .conf file.

SUM the results of GROUP BY whilst using the IN operator with PostgreSQL

My query:
SELECT "Tracks"."PageId", date_trunc('month', "Tracks"."createdAt") AS month
FROM "Tracks"
WHERE "Tracks"."PageId" IN (1,2,3)
GROUP BY month, "Tracks"."PageId"`)
However, this yields:
[ { PageId: 30, month: Thu Sep 01 2016 01:00:00 GMT+0100 (BST) },
{ PageId: 29, month: Thu Sep 01 2016 01:00:00 GMT+0100 (BST) },
{ PageId: 31, month: Thu Sep 01 2016 01:00:00 GMT+0100 (BST) },
{ PageId: 29, month: Sat Oct 01 2016 01:00:00 GMT+0100 (BST) } ]
What's the best approach to also get the SUM of each PageId, ie. there were 3 of PageId 29 in October 2016, and 2 in September...

Sudden Mongodb high connections/queues, db completely freezes

The issue
We have a strange issue on our mongodb setup. Sometimes we get peaks of high connections and high queues and the mongodb process stops responding if we let the queues and connections increase. We need to restart the instance using sigkill with htop.
It seems that there is a system limit / mongodb configuration blocking mongodb from operating, because hardware resources are ok. Versions of this issue happening on stand alone and then replica set on production servers. Details ahead.
About the software environment
This is a stand alone mongodb instance (not sharded nor replica sets), it's operating on a dedicated machine, and it's queried by other machines. I'm using mongodb-linux-x86_64-2.6.11 under Debian 7.7.
The machines querying mongo are using Django==1.7.4, Mongoengine=0.10.1 with pymongo==2.8.
On the Django settings.py file I'm connecting to the database using the following lines:
from mongoengine import connect
connect(
MONGO_DB,
username = MONGO_USER,
password = MONGO_PWD,
host = MONGO_HOST,
port = MONGO_PORT
)
MMS Stats
As you can see in the following img from the MMS service we have peaks on connections and queques:
When this happens, our mongodb process completely freezes. We must use SIGKILL to restart mongodb, which is really bad.
In the image there are 3 freeze events.
As the img shows, when this happens, we have a peak on Non-Mapped Virtual Memory too.
Also we spotted an increase on the Btree chart around the 2nd and 3rd freeze.
We have checked the logs, but there is no suspicious query, also the Opcounters don't skyrocket, it seems that there are no more queries than usual.
Here is another screenshot on the same bug but on another day/time:
On all the cases, the lock on the DB is not significantly increasing, it has a peak but not reaching even 4%:
OpCounter drops to zero, it seems that every op goes to the mongodb queque, so the database creates new connections to try to execute new requests, all of them going to the queue as well.
Machine Resources
Regarding hardware, the machine is a Google Cloud Compute instance with 4 Intel Xeon Cores, 16 Gb ram, 100 GB SSD disk.
No noticeable high network/io/CPU/ram issues detected, no peaks on resources, even when the mongod process is frozen.
MySQL on another machine also gets affected
Also we detect that at the same time of this mongod peak on queques and connections, we also get a spike on mysql connections, which is running on another machine. When I kill the mongodb process, all the mysql connections are released too (without doing a mysql restart).
ulimit
I increased system limits, to see if that was the cause of the issue but it seems that this did not fix the problem.
I set up everything as recommended on this MongoDB article. The spike on connections continue. I'm trying to find a way to debug where are this connections coming from.
$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 60240
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 409600
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 60240
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
db.currentOp
I just added a shell scripts that runs every 1 second with the following:
var ops = db.currentOp().inprog
if (ops !== undefined && ops.length > 0){
ops.forEach(function(op){
if(op.secs_running > 0) printjson(op);
})
}
The log does not report any operation that is taking more than 1 second to execute. I was thinking about a process taking long time on something but it seems that is not the case.
MongoDB Logs
Regarding the mongodb.log, here is the full mongodb log around the problem.
It just happens on log line 361. There the connections start to go up, and no more queries get executed. Also I cant call the mongo shell, it says:
[Wed Feb 10 15:46:01 UTC 2016] 2016-02-10T15:48:31.940+0000 DBClientCursor::init call() failed
2016-02-10T15:48:31.941+0000 Error: DBClientBase::findN: transport error: 127.0.0.1:27000 ns: admin.$cmd query: { whatsmyuri: 1 } at src/mongo/shell/mongo.js:148
Log extract
2016-02-10T15:41:39.930+0000 [initandlisten] connection accepted from 10.240.0.3:56611 #3665 (79 connections now open)
2016-02-10T15:41:39.930+0000 [conn3665] command admin.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:39.930+0000 [conn3665] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 numYields:0 reslen:37 0ms
2016-02-10T15:41:39.992+0000 [conn3529] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 310 } } planSummary: IXSCAN { recipient: 1 } keyUpdates:0 numYields:0 locks(micros) r:215 reslen:48 0ms
2016-02-10T15:41:40.038+0000 [conn2303] query db.column query: { _id: ObjectId('56b395dfbe66324cbee550b8'), client_id: 20 } planSummary: IXSCAN { _id: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:116 nreturned:1 reslen:470 0ms
2016-02-10T15:41:40.044+0000 [conn1871] update db.column query: { _id: ObjectId('56b395dfbe66324cbee550b8') } update: { $set: { last_request: new Date(1455118900040) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0 numYields:0 locks(micros) w:126 0ms
2016-02-10T15:41:40.044+0000 [conn1871] command db.$cmd command: update { update: "column", writeConcern: { w: 1 }, updates: [ { q: { _id: ObjectId('56b395dfbe66324cbee550b8') }, u: { $set: { last_request: new Date(1455118900040) } }, multi: false, upsert: true } ] } keyUpdates:0 numYields:0 reslen:55 0ms
2016-02-10T15:41:40.048+0000 [conn1875] query db.user query: { sn: "mobile", client_id: 20, uid: "56990023700" } planSummary: IXSCAN { client_id: 1, uid: 1, sn: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:197 nreturned:1 reslen:303 0ms
2016-02-10T15:41:40.056+0000 [conn2303] Winning plan had zero results. Not caching. ns: db.case query: { sn: "mobile", client_id: 20, created: { $gt: new Date(1454295600000), $lt: new Date(1456800900000) }, deleted: false, establishment_users: { $all: [ ObjectId('5637640afefa2654b5d863e3') ] }, is_closed: true, updated_time: { $gt: new Date(1455045840000) } } sort: { updated_time: 1 } projection: {} skip: 0 limit: 15 winner score: 1.0003 winner summary: IXSCAN { client_id: 1, is_closed: 1, deleted: 1, updated_time: 1 }
2016-02-10T15:41:40.057+0000 [conn2303] query db.case query: { $query: { sn: "mobile", client_id: 20, created: { $gt: new Date(1454295600000), $lt: new Date(1456800900000) }, deleted: false, establishment_users: { $all: [ ObjectId('5637640afefa2654b5d863e3') ] }, is_closed: true, updated_time: { $gt: new Date(1455045840000) } }, $orderby: { updated_time: 1 } } planSummary: IXSCAN { client_id: 1, is_closed: 1, deleted: 1, updated_time: 1 } ntoreturn:15 ntoskip:0 nscanned:26 nscannedObjects:26 keyUpdates:0 numYields:0 locks(micros) r:5092 nreturned:0 reslen:20 5ms
2016-02-10T15:41:40.060+0000 [conn300] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 309 } } planSummary: IXSCAN { recipient: 1 } keyUpdates:0 numYields:0 locks(micros) r:63 reslen:48 0ms
2016-02-10T15:41:40.133+0000 [initandlisten] connection accepted from 127.0.0.1:43266 #3666 (80 connections now open)
2016-02-10T15:41:40.133+0000 [conn3666] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:62 0ms
2016-02-10T15:41:40.134+0000 [conn3666] command db.$cmd command: getnonce { getnonce: 1 } ntoreturn:1 keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:40.134+0000 [conn3666] authenticate db: db { authenticate: 1, nonce: "xxx", user: "xxx", key: "xxx" }
2016-02-10T15:41:40.134+0000 [conn3666] command db.$cmd command: authenticate { authenticate: 1, nonce: "xxx", user: "xxx", key: "xxx" } ntoreturn:1 keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:40.136+0000 [conn3666] end connection 127.0.0.1:43266 (79 connections now open)
2016-02-10T15:41:40.146+0000 [conn3051] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 301 } } planSummary: IXSCAN { recipient: 1 } keyUpdates:0 numYields:0 locks(micros) r:284 reslen:48 0ms
2016-02-10T15:41:40.526+0000 [conn3529] query db.column query: { _id: ObjectId('56a8d864be6632718f9fb087'), client_id: 1 } planSummary: IXSCAN { _id: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:176 nreturned:1 reslen:440 0ms
2016-02-10T15:41:40.529+0000 [conn3529] update db.column query: { _id: ObjectId('56a8d864be6632718f9fb087') } update: { $set: { last_request: new Date(1455118900527) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0 numYields:0 locks(micros) w:61 0ms
2016-02-10T15:41:40.529+0000 [conn3529] command db.$cmd command: update { update: "column", writeConcern: { w: 1 }, updates: [ { q: { _id: ObjectId('56a8d864be6632718f9fb087') }, u: { $set: { last_request: new Date(1455118900527) } }, multi: false, upsert: true } ] } keyUpdates:0 numYields:0 reslen:55 0ms
2016-02-10T15:41:40.531+0000 [conn3529] query db.user query: { sn: "email", client_id: 1, uid: "asdasdasdasdas" } planSummary: IXSCAN { client_id: 1, uid: 1, sn: 1 } ntoreturn:2 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:278 nreturned:1 reslen:285 0ms
2016-02-10T15:41:40.546+0000 [conn3529] Winning plan had zero results. Not caching. ns: db.case query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ ObjectId('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } } sort: { updated_time: -1 } projection: {} skip: 0 limit: 1 winner score: 1.0003 winner summary: IXSCAN { client_id: 1, establishment_users: 1, updated_time: 1 }
2016-02-10T15:41:40.547+0000 [conn3529] query db.case query: { $query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ ObjectId('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } }, $orderby: { updated_time: -1 } } planSummary: IXSCAN { client_id: 1, establishment_users: 1, updated_time: 1 } ntoskip:0 nscanned:103 nscannedObjects:103 keyUpdates:0 numYields:0 locks(micros) r:9410 nreturned:0 reslen:20 9ms
2016-02-10T15:41:40.557+0000 [conn3529] Winning plan had zero results. Not caching. ns: db.case query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ ObjectId('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } } sort: { updated_time: -1 } projection: {} skip: 0 limit: 15 winner score: 1.0003 winner summary: IXSCAN { client_id: 1, establishment_users: 1, updated_time: 1 }
2016-02-10T15:41:40.558+0000 [conn3529] query db.case query: { $query: { answered: true, sn: "email", client_id: 1, establishment_users: { $all: [ ObjectId('5669b930fefa2626db389c0e') ] }, deleted: false, is_closed: { $ne: true } }, $orderby: { updated_time: -1 } } planSummary: IXSCAN { client_id: 1, establishment_users: 1, updated_time: 1 } ntoreturn:15 ntoskip:0 nscanned:103 nscannedObjects:103 keyUpdates:0 numYields:0 locks(micros) r:7572 nreturned:0 reslen:20 7ms
2016-02-10T15:41:40.569+0000 [conn3028] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 145 } } planSummary: IXSCAN { recipient: 1 } keyUpdates:0 numYields:0 locks(micros) r:237 reslen:48 0ms
2016-02-10T15:41:40.774+0000 [conn3053] command db.$cmd command: count { count: "notification", fields: null, query: { read: false, recipient: 143 } } planSummary: IXSCAN { recipient: 1 } keyUpdates:0 numYields:0 locks(micros) r:372 reslen:48 0ms
2016-02-10T15:41:41.056+0000 [conn22] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 numYields:0 reslen:37 0ms
#########################
HERE THE PROBLEM STARTS
#########################
2016-02-10T15:41:41.175+0000 [initandlisten] connection accepted from 127.0.0.1:43268 #3667 (80 connections now open)
2016-02-10T15:41:41.212+0000 [initandlisten] connection accepted from 10.240.0.6:46021 #3668 (81 connections now open)
2016-02-10T15:41:41.213+0000 [conn3668] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:41.213+0000 [conn3668] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:41.213+0000 [conn3668] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:41.348+0000 [initandlisten] connection accepted from 10.240.0.6:46024 #3669 (82 connections now open)
2016-02-10T15:41:41.349+0000 [conn3669] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:41.349+0000 [conn3669] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:41.349+0000 [conn3669] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:43.620+0000 [initandlisten] connection accepted from 10.240.0.6:46055 #3670 (83 connections now open)
2016-02-10T15:41:43.621+0000 [conn3670] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:43.621+0000 [conn3670] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:43.621+0000 [conn3670] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:43.655+0000 [initandlisten] connection accepted from 10.240.0.6:46058 #3671 (84 connections now open)
2016-02-10T15:41:43.656+0000 [conn3671] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:43.656+0000 [conn3671] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:43.656+0000 [conn3671] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:44.045+0000 [initandlisten] connection accepted from 10.240.0.6:46071 #3672 (85 connections now open)
2016-02-10T15:41:44.045+0000 [conn3672] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:44.046+0000 [conn3672] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:44.046+0000 [conn3672] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:44.083+0000 [initandlisten] connection accepted from 10.240.0.6:46073 #3673 (86 connections now open)
2016-02-10T15:41:44.084+0000 [conn3673] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
2016-02-10T15:41:44.084+0000 [conn3673] authenticate db: db { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" }
2016-02-10T15:41:44.084+0000 [conn3673] command db.$cmd command: authenticate { authenticate: 1, user: "xxx", nonce: "xxx", key: "xxx" } keyUpdates:0 numYields:0 reslen:82 0ms
2016-02-10T15:41:44.182+0000 [initandlisten] connection accepted from 10.240.0.6:46076 #3674 (87 connections now open)
2016-02-10T15:41:44.182+0000 [conn3674] command db.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 numYields:0 reslen:65 0ms
Collection Information
Currently our database contains 163 collections. The important ones are messages, column and cases, this are the ones that get heavy inserts, updates and queries on. The rest if for analytics and are many collections of about 100 records each:
{
"ns" : "db.message",
"count" : 2.96615e+06,
"size" : 3906258304.0000000000000000,
"avgObjSize" : 1316,
"storageSize" : 9305935856.0000000000000000,
"numExtents" : 25,
"nindexes" : 21,
"lastExtentSize" : 2.14643e+09,
"paddingFactor" : 1.0530000000000086,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 7952525392.0000000000000000,
"indexSizes" : {
"_id_" : 1.63953e+08,
"client_id_1_sn_1_mid_1" : 3.16975e+08,
"client_id_1_created_1" : 1.89086e+08,
"client_id_1_recipients_1_created_1" : 4.3861e+08,
"client_id_1_author_1_created_1" : 2.29713e+08,
"client_id_1_kind_1_created_1" : 2.37088e+08,
"client_id_1_answered_1_created_1" : 1.90934e+08,
"client_id_1_is_mention_1_created_1" : 1.8674e+08,
"client_id_1_has_custom_data_1_created_1" : 1.9566e+08,
"client_id_1_assigned_1_created_1" : 1.86838e+08,
"client_id_1_published_1_created_1" : 1.94352e+08,
"client_id_1_sn_1_created_1" : 2.3681e+08,
"client_id_1_thread_root_1" : 1.88089e+08,
"client_id_1_case_id_1" : 1.89266e+08,
"client_id_1_sender_id_1" : 1.5182e+08,
"client_id_1_recipient_id_1" : 1.49711e+08,
"client_id_1_mid_1_sn_1" : 3.17662e+08,
"text_text_created_1" : 3320641520.0000000000000000,
"client_id_1_sn_1_kind_1_recipient_id_1_created_1" : 3.15226e+08,
"client_id_1_sn_1_thread_root_1_created_1" : 3.06526e+08,
"client_id_1_case_id_1_created_1" : 2.46825e+08
},
"ok" : 1.0000000000000000
}
{
"ns" : "db.case",
"count" : 497661,
"size" : 5.33111e+08,
"avgObjSize" : 1071,
"storageSize" : 6.29637e+08,
"numExtents" : 16,
"nindexes" : 34,
"lastExtentSize" : 1.68743e+08,
"paddingFactor" : 1.0000000000000000,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 8.46012e+08,
"indexSizes" : {
"_id_" : 2.30073e+07,
"client_id_1" : 1.99985e+07,
"is_closed, deleted_1" : 1.31061e+07,
"is_closed_1" : 1.36948e+07,
"sn_1" : 2.1274e+07,
"deleted_1" : 1.39728e+07,
"created_1" : 1.97777e+07,
"current_assignment_1" : 4.20819e+07,
"assigned_1" : 1.33678e+07,
"commented_1" : 1.36049e+07,
"has_custom_data_1" : 1.42426e+07,
"sentiment_start_1" : 1.36049e+07,
"sentiment_finish_1" : 1.37275e+07,
"updated_time_1" : 2.02192e+07,
"identifier_1" : 1.73822e+07,
"important_1" : 1.38256e+07,
"answered_1" : 1.41772e+07,
"client_id_1_is_closed_1_deleted_1_updated_time_1" : 2.90248e+07,
"client_id_1_is_closed_1_updated_time_1" : 2.86569e+07,
"client_id_1_sn_1_updated_time_1" : 3.58436e+07,
"client_id_1_deleted_1_updated_time_1" : 2.8477e+07,
"client_id_1_updated_time_1" : 2.79619e+07,
"client_id_1_current_assignment_1_updated_time_1" : 5.6071e+07,
"client_id_1_assigned_1_updated_time_1" : 2.87713e+07,
"client_id_1_commented_1_updated_time_1" : 2.86896e+07,
"client_id_1_has_custom_data_1_updated_time_1" : 2.88286e+07,
"client_id_1_sentiment_start_1_updated_time_1" : 2.87223e+07,
"client_id_1_sentiment_finish_1_updated_time_1" : 2.88776e+07,
"client_id_1_identifier_1_updated_time_1" : 3.48216e+07,
"client_id_1_important_1_updated_time_1" : 2.88776e+07,
"client_id_1_answered_1_updated_time_1" : 2.85669e+07,
"client_id_1_establishment_users_1_updated_time_1" : 3.93838e+07,
"client_id_1_identifier_1" : 1.86413e+07,
"client_id_1_sn_1_users_1_updated_time_1" : 4.47309e+07
},
"ok" : 1.0000000000000000
}
{
"ns" : "db.column",
"count" : 438,
"size" : 218672,
"avgObjSize" : 499,
"storageSize" : 696320,
"numExtents" : 4,
"nindexes" : 2,
"lastExtentSize" : 524288,
"paddingFactor" : 1.0000000000000000,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 65408,
"indexSizes" : {
"_id_" : 32704,
"client_id_1_owner_1" : 32704
},
"ok" : 1.0000000000000000
}
Mongostat
Here is some of the lines we have running mongostat during normal operation:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 34 2 *0 0 10|0 0 32.6g 65.5g 1.18g 0 db:0.1% 0 0|0 0|0 4k 39k 87 20:44:44
2 31 13 *0 0 7|0 0 32.6g 65.5g 1.17g 3 db:0.8% 0 0|0 0|0 9k 36k 87 20:44:45
1 18 2 *0 0 5|0 0 32.6g 65.5g 1.12g 0 db:0.4% 0 0|0 0|0 3k 18k 87 20:44:46
5 200 57 *0 0 43|0 0 32.6g 65.5g 1.13g 12 db:2.3% 0 0|0 0|0 46k 225k 86 20:44:47
1 78 23 *0 0 5|0 0 32.6g 65.5g 1.01g 1 db:1.6% 0 0|0 0|0 18k 313k 86 20:44:48
*0 10 1 *0 0 5|0 0 32.6g 65.5g 1004m 0 db:0.2% 0 0|0 1|0 1k 8k 86 20:44:49
3 48 23 *0 0 11|0 0 32.6g 65.5g 1.05g 4 db:1.1% 0 0|0 0|0 16k 48k 86 20:44:50
2 38 13 *0 0 8|0 0 32.6g 65.5g 1.01g 8 db:0.9% 0 0|0 0|0 10k 76k 86 20:44:51
3 28 16 *0 0 9|0 0 32.6g 65.5g 1.01g 7 db:1.1% 0 0|0 1|0 11k 62k 86 20:44:52
*0 9 4 *0 0 8|0 0 32.6g 65.5g 1022m 1 db:0.4% 0 0|0 0|0 3k 6k 87 20:44:53
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
3 107 34 *0 0 6|0 0 32.6g 65.5g 1.02g 1 db:1.1% 0 0|0 0|0 23k 107k 87 20:44:54
4 65 37 *0 0 8|0 0 32.6g 65.5g 2.69g 57 db:6.2% 0 0|0 0|0 24k 126k 87 20:44:55
9 84 45 *0 0 8|0 0 32.6g 65.5g 2.63g 17 db:5.3% 0 0|0 1|0 32k 109k 87 20:44:56
4 84 47 *0 0 44|0 0 32.6g 65.5g 1.89g 10 db:5.9% 0 0|0 1|0 30k 146k 86 20:44:57
3 73 32 *0 0 9|0 0 32.6g 65.5g 2.58g 12 db:4.7% 0 0|0 0|0 20k 112k 86 20:44:58
2 165 48 *0 0 7|0 0 32.6g 65.5g 2.62g 7 db:1.3% 0 0|0 0|0 34k 147k 86 20:44:59
3 61 26 *0 0 12|0 0 32.6g 65.5g 2.2g 6 db:4.7% 0 0|0 1|0 19k 73k 86 20:45:00
3 252 64 *0 0 12|0 0 32.6g 65.5g 1.87g 85 db:3.2% 0 0|0 0|0 52k 328k 86 20:45:01
*0 189 40 *0 0 6|0 0 32.6g 65.5g 1.65g 0 db:1.6% 0 0|0 0|0 33k 145k 87 20:45:02
1 18 10 *0 0 5|0 0 32.6g 65.5g 1.55g 3 db:0.9% 0 0|0 0|0 6k 15k 87 20:45:03
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
1 50 11 *0 0 6|0 0 32.6g 65.5g 1.57g 6 db:0.8% 0 0|0 0|0 9k 63k 87 20:45:04
2 49 16 *0 0 6|0 0 32.6g 65.5g 1.56g 1 db:1.1% 0 0|0 0|0 12k 50k 87 20:45:05
1 35 11 *0 0 7|0 0 32.6g 65.5g 1.58g 1 db:0.9% 0 0|0 0|0 8k 41k 87 20:45:06
*0 18 2 *0 0 42|0 0 32.6g 65.5g 1.55g 0 db:0.4% 0 0|0 0|0 5k 19k 86 20:45:07
6 75 40 *0 0 11|0 0 32.6g 65.5g 1.56g 10 db:1.9% 0 0|0 0|0 27k 89k 86 20:45:08
6 60 35 *0 0 7|0 0 32.6g 65.5g 1.89g 5 db:1.5% 0 0|0 1|0 23k 101k 86 20:45:09
2 17 14 *0 0 7|0 0 32.6g 65.5g 1.9g 0 db:1.3% 0 0|0 1|0 8k 29k 86 20:45:10
2 35 7 *0 0 4|0 0 32.6g 65.5g 1.77g 1 db:1.3% 0 0|0 0|0 7k 60k 86 20:45:12
4 50 28 *0 0 10|0 0 32.6g 65.5g 1.75g 10 db:2.0% 0 0|0 0|0 19k 79k 87 20:45:13
*0 3 1 *0 0 5|0 0 32.6g 65.5g 1.63g 0 .:0.7% 0 0|0 0|0 1k 4k 87 20:45:14
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
5 77 35 *0 0 8|0 0 32.6g 65.5g 1.7g 13 db:3.0% 0 0|0 0|0 23k 124k 88 20:45:15
3 35 18 *0 0 7|0 0 32.6g 65.5g 1.7g 5 db:0.8% 0 0|0 0|0 12k 43k 87 20:45:16
1 18 5 *0 0 11|0 0 32.6g 65.5g 1.63g 2 db:0.9% 0 0|0 0|0 5k 35k 87 20:45:17
3 33 21 *0 0 5|0 0 32.6g 65.5g 1.64g 3 db:0.8% 0 0|0 0|0 13k 32k 87 20:45:18
*0 25 4 *0 0 42|0 0 32.6g 65.5g 1.64g 0 db:0.3% 0 0|0 0|0 5k 34k 86 20:45:19
1 25 5 *0 0 5|0 0 32.6g 65.5g 1.65g 3 db:0.2% 0 0|0 0|0 5k 24k 86 20:45:20
12 88 65 *0 0 7|0 0 32.6g 65.5g 1.7g 25 db:4.2% 0 0|0 0|0 42k 121k 86 20:45:21
2 53 17 *0 0 4|0 0 32.6g 65.5g 1.65g 2 db:1.5% 0 0|0 0|0 12k 82k 86 20:45:22
1 9 6 *0 0 7|0 0 32.6g 65.5g 1.64g 1 db:1.0% 0 0|0 0|0 4k 13k 86 20:45:23
*0 6 2 *0 0 7|0 0 32.6g 65.5g 1.63g 0 db:0.1% 0 0|0 0|0 1k 5k 87 20:45:24
Replica Set: Updated on May 15th 2016
We migrated our stand alone instance to a replica set. 2 secondaries serving the reads and 1 primary doing the writes. All the machines on the replica set area snapshots of the original machine. What happened with this new configuration is that the issue changed and it's harder to detect.
It happens less frequently but instead of sky rocketing connections and queues, the whole replica set stops reading/writing, with no high connections, no queues no expensive operations at all. All request to the DB just time out. To fix the issue a SIGKILL to the mongodb process must be sent to all 3 machines.
Hi this is exact problem we faced too and its very difficult to tell exact root cause and requires us lot of to and fro from official mongo support to understand some common problems.
Most of mongo setups done on unix and they have limit connections from same user though mongo server stats we can see lot of connections available. You get to get this setting to max possible value https://www.mongodb.com/docs/manual/reference/ulimit/
While connecting most of time we use default insert or insert many which has write concern as 1. In sharded cluster while saving to primary node is fast it takes time to replicate to other nodes and there are many connections left open there. If your cluster is 2 primary in 1 region and 1 secondary in DR region then network latency can come to play. Better go with majority write concern to avoid issues
https://www.mongodb.com/docs/manual/reference/write-concern/
There is max connection pool property which if not set will be default to 100 connection pool. So you application will try to create 100 connections if it requires to store fast. Limit connection pool based on your application need. We have very huge volume around 1 lakh per min still with multiple services max 20-30 connections sufficient to store that much volume.
https://www.mongodb.com/docs/manual/reference/connection-string/
We still trying to make mongo sharded infrastructure stable but its not mongo db itself. Its overall infrastructure that's causing the problem.

Mongo secondary automatically going to Recovering state and gets stuck

I am using mongodb 2.6.5 and have a 3 node replica set. Many a time i see that the secondary nodes goes of to Recovering state, though i do not try to do any sync or anything. Not sure if mongo is doing it in the backend. The status never comes out of Recovering. I saw many threads which tells how i can bring back from Recovering state to Secondary, but what i am looking out it to find the issue as to why it is going to Recovering state. Is there any configuration that could lead to this..
A few lines of logs (that i feel could be useful) from my secondary server that went to Recovering. 192.168.12.155:5000 is the primary node and 192.168.12.154:5000 is the secondary node. The below log was taken from 192.168.12.154..
2015-03-08T20:02:20.963+0530 [conn223] end connection 192.168.31.152:43503 (4 connections now open)
2015-03-08T20:02:20.965+0530 [initandlisten] connection accepted from 192.168.31.152:43505 #225 (5 connections now open)
2015-03-08T20:02:21.065+0530 [conn224] end connection 192.168.31.152:43504 (4 connections now open)
2015-03-08T20:02:21.076+0530 [initandlisten] connection accepted from 192.168.31.152:43506 #226 (5 connections now open)
2015-03-08T20:02:21.207+0530 [conn225] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|445, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 112ms
2015-03-08T20:02:21.209+0530 [conn225] end connection 192.168.31.152:43505 (4 connections now open)
2015-03-08T20:02:21.211+0530 [initandlisten] connection accepted from 192.168.31.152:43507 #227 (5 connections now open)
2015-03-08T20:02:21.345+0530 [conn227] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 133ms
2015-03-08T20:02:21.441+0530 [conn226] end connection 192.168.31.152:43506 (4 connections now open)
2015-03-08T20:02:21.453+0530 [initandlisten] connection accepted from 192.168.31.152:43508 #228 (5 connections now open)
2015-03-08T20:02:21.586+0530 [conn227] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|448, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 113ms
2015-03-08T20:02:21.588+0530 [conn227] end connection 192.168.31.152:43507 (4 connections now open)
2015-03-08T20:02:21.590+0530 [initandlisten] connection accepted from 192.168.31.152:43509 #229 (5 connections now open)
2015-03-08T20:02:21.707+0530 [conn229] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 116ms
2015-03-08T20:02:21.808+0530 [conn228] end connection 192.168.31.152:43508 (4 connections now open)
2015-03-08T20:02:21.821+0530 [initandlisten] connection accepted from 192.168.31.152:43510 #230 (5 connections now open)
2015-03-08T20:02:21.833+0530 [conn229] end connection 192.168.31.152:43509 (4 connections now open)
2015-03-08T20:02:21.834+0530 [initandlisten] connection accepted from 192.168.31.152:43511 #231 (5 connections now open)
2015-03-08T20:02:22.069+0530 [conn62] end connection 192.168.12.155:42354 (4 connections now open)
2015-03-08T20:02:22.069+0530 [initandlisten] connection accepted from 192.168.12.155:42811 #232 (6 connections now open)
2015-03-08T20:02:22.069+0530 [conn231] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 234ms
2015-03-08T20:02:22.167+0530 [conn230] end connection 192.168.31.152:43510 (4 connections now open)
2015-03-08T20:02:22.177+0530 [initandlisten] connection accepted from 192.168.31.152:43512 #233 (5 connections now open)
2015-03-08T20:02:22.315+0530 [conn231] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|454, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 116ms
2015-03-08T20:02:22.317+0530 [conn231] end connection 192.168.31.152:43511 (4 connections now open)
2015-03-08T20:02:22.319+0530 [initandlisten] connection accepted from 192.168.31.152:43513 #234 (5 connections now open)
2015-03-08T20:02:22.432+0530 [conn234] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 112ms
2015-03-08T20:02:22.529+0530 [conn233] end connection 192.168.31.152:43512 (4 connections now open)
2015-03-08T20:02:22.540+0530 [initandlisten] connection accepted from 192.168.31.152:43514 #235 (5 connections now open)
2015-03-08T20:02:22.681+0530 [conn234] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|457, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 118ms
2015-03-08T20:02:22.682+0530 [conn234] end connection 192.168.31.152:43513 (4 connections now open)
2015-03-08T20:02:22.684+0530 [initandlisten] connection accepted from 192.168.31.152:43515 #236 (5 connections now open)
2015-03-08T20:02:22.794+0530 [conn236] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 108ms
2015-03-08T20:02:22.891+0530 [conn235] end connection 192.168.31.152:43514 (4 connections now open)
2015-03-08T20:02:22.902+0530 [initandlisten] connection accepted from 192.168.31.152:43516 #237 (5 connections now open)
2015-03-08T20:02:22.927+0530 [conn236] end connection 192.168.31.152:43515 (4 connections now open)
2015-03-08T20:02:22.929+0530 [initandlisten] connection accepted from 192.168.31.152:43517 #238 (5 connections now open)
2015-03-08T20:02:23.156+0530 [conn238] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, handshake: { handshake: ObjectId('54fc5b764db3d7d780142e5b'), member: 3, config: { _id: 3, host: "192.168.31.152:5000" } } } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 226ms
2015-03-08T20:02:23.252+0530 [conn237] end connection 192.168.31.152:43516 (4 connections now open)
2015-03-08T20:02:23.263+0530 [initandlisten] connection accepted from 192.168.31.152:43518 #239 (5 connections now open)
2015-03-08T20:02:23.414+0530 [conn238] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|463, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 122ms
2015-03-08T20:02:23.415+0530 [conn238] end connection 192.168.31.152:43517 (4 connections now open)
2015-03-08T20:02:23.417+0530 [initandlisten] connection accepted from 192.168.31.152:43519 #240 (5 connections now open)
2015-03-08T20:02:23.619+0530 [conn239] end connection 192.168.31.152:43518 (4 connections now open)
2015-03-08T20:02:23.628+0530 [initandlisten] connection accepted from 192.168.31.152:43520 #241 (5 connections now open)
2015-03-08T20:02:23.781+0530 [conn240] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|466, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 124ms
2015-03-08T20:02:23.782+0530 [conn240] end connection 192.168.31.152:43519 (4 connections now open)
2015-03-08T20:02:23.784+0530 [initandlisten] connection accepted from 192.168.31.152:43521 #242 (5 connections now open)
2015-03-08T20:02:23.979+0530 [conn241] end connection 192.168.31.152:43520 (4 connections now open)
2015-03-08T20:02:23.986+0530 [initandlisten] connection accepted from 192.168.31.152:43522 #243 (5 connections now open)
2015-03-08T20:02:24.148+0530 [conn242] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('54fc5b764db3d7d780142e5b'), optime: Timestamp 1425825044000|469, config: { _id: 3, host: "192.168.31.152:5000" } } ] } ntoreturn:1 keyUpdates:0 numYields:0 reslen:37 127ms
2015-03-08T20:02:24.149+0530 [conn242] end connection 192.168.31.152:43521 (4 connections now open)
2015-03-08T20:02:24.152+0530 [initandlisten] connection accepted from 192.168.31.152:43523 #244 (5 connections now open)
2015-03-08T20:02:24.341+0530 [conn243] end connection 192.168.31.152:43522 (4 connections now open)
2015-03-08T20:02:24.350+0530 [initandlisten] connection accepted from 192.168.31.152:43524 #245 (5 connections now open)
2015-03-08T20:02:24.504+0530 [conn244] end connection 192.168.31.152:43523 (4 connections now open)
2015-03-08T20:02:24.506+0530 [initandlisten] connection accepted from 192.168.31.152:43525 #246 (5 connections now open)
2015-03-08T20:02:24.707+0530 [conn245] end connection 192.168.31.152:43524 (4 connections now open)
2015-03-08T20:02:24.714+0530 [initandlisten] connection accepted from 192.168.31.152:43526 #247 (5 connections now open)
2015-03-08T20:02:24.899+0530 [conn246] end connection 192.168.31.152:43525 (4 connections now open)
2015-03-08T20:02:24.901+0530 [initandlisten] connection accepted from 192.168.31.152:43527 #248 (5 connections now open)
2015-03-08T20:02:24.969+0530 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
2015-03-08T20:02:25.051+0530 [rsBackgroundSync] replSet syncing to: 192.168.12.155:5000
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet not trying to sync from 192.168.12.155:5000, it is vetoed for 600 more seconds
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet not trying to sync from 192.168.12.155:5000, it is vetoed for 600 more seconds
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from 192.168.12.155:5000
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet our last optime : Mar 8 20:00:53 54fc5d1d:3c
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet oldest at 192.168.12.155:5000 : Mar 8 20:00:54 54fc5d1e:5d
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-03-08T20:02:25.081+0530 [rsBackgroundSync] replSet RECOVERING
2015-03-08T20:02:25.137+0530 [conn247] end connection 192.168.31.152:43526 (4 connections now open)
2015-03-08T20:02:25.146+0530 [initandlisten] connection accepted from 192.168.31.152:43528 #249 (5 connections now open)
2015-03-08T20:02:25.147+0530 [conn248] end connection 192.168.31.152:43527 (4 connections now open)
2015-03-08T20:02:25.148+0530 [initandlisten] connection accepted from 192.168.31.152:43529 #250 (5 connections now open)
2015-03-08T20:02:25.245+0530 [conn249] end connection 192.168.31.152:43528 (4 connections now open)
2015-03-08T20:02:25.253+0530 [initandlisten] connection accepted from 192.168.31.152:43530 #251 (5 connections now open)
2015-03-08T20:02:25.254+0530 [conn250] end connection 192.168.31.152:43529 (4 connections now open)
2015-03-08T20:02:25.254+0530 [initandlisten] connection accepted from 192.168.31.152:43531 #252 (5 connections now open)
2015-03-08T20:02:25.349+0530 [conn251] end connection 192.168.31.152:43530 (4 connections now open)
2015-03-08T20:02:25.356+0530 [initandlisten] connection accepted from 192.168.31.152:43532 #253 (5 connections now open)
2015-03-08T20:02:25.357+0530 [conn252] end connection 192.168.31.152:43531 (4 connections now open)
2015-03-08T20:02:25.357+0530 [initandlisten] connection accepted from 192.168.31.152:43533 #254 (5 connections now open)
Mongodb documentation has a section on this
https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-auto-resync-stale-member

Datediff query in mongodb

I am not a expert in Mongodb so stuck at one point.
I have DTEnd column in my collection where i store end date of an event . I want to fetch closing soon events with criteria sysdate-DTEnd <2 could anybody let me know how to fetch the same in mongodb . Sample schema is as below
{ CorporateId: null, CorporateName: 'Nuclues Software', Description: 'Basic AngularJS Training First Semeseter Guys', EndDateTime: Thu Sep 05 2013 00:00:00 GMT+0530 (India Standard Time), NominationCount: 0, StartDateTime: Thu Sep 05 2013 00:00:00 GMT+0530 (India Standard Time), Status: null, ViewCount: 22, _id: 52298b8c60df891c30e553ca, Address: { ZipCode: '1212', City: '1212', State: '1212', StreetAddress: '1212', NearestLandMark: '1212' }, Nominations: [], Technologies: [ { Name: 'AngularJS' }, { Name: 'Asp.net' } ] }
Solved using $lt and $gte
> db.Training.find({EndDateTime:{ $gte: ISODate("2013-09-10T00:00:00.000Z"), $lt
: ISODate("2013-09-12T00:00:00.000Z")}}).count()