I'm running a sharded MongoDB instance and as per the instructions, the config servers are a replica set. I'm unable to upgrade from v4.2.9 to 4.4.0. Per the upgrade instructions, I need to upgrade the config servers first, starting with a secondary. It already failed there. I shut down the secondary's instance, replaced the binaries, and restarted it. But it didn't start up again. The logs say the following (I removed the timestamps for clarity):
"msg":"The size storer reports that the oplog contains","attr":{"numRecords":53890848,"dataSize":13618131721}}
"msg":"Sampling the oplog to determine where to place markers for truncation"}
"msg":"Sampling from the oplog to determine where to place markers for truncation","attr":{"from":{"$timestamp":{"t":1494750837,"i":1}},"to":{"$timestamp":{"t":1598687615,"i":1}}}}
"msg":"Taking samples and assuming each oplog section contains","attr":{"numSamples":253,"containsNumRecords":2124552,"containsNumBytes":536870917}}
"msg":"User assertion","attr":{"error":"Location13111: field not found, expected type date","file":"src/mongo/bson/bsonelement.h","line":810}}
"msg":"WiredTiger record store oplog processing finished","attr":{"durationMillis":21}}
"msg":"~WiredTigerRecordStore for: {ns}","attr":{"ns":"local.oplog.rs"}}
"msg":"Invariant failure","attr":{"expr":"_oplogManagerCount > 0","file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":2467}}
"msg":"\n\n***aborting after invariant() failure\n\n"}
"msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
"msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"55C91A79E621","b":"55C917AE3000","o":"2CBB621","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"55C91A79FCC9","b":"55C917AE3000","o":"2CBCCC9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55C91A79D4B6","b":"55C917AE3000","o":"2CBA4B6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7FAF200070E0","b":"7FAF1FFF6000","o":"110E0","s":"funlockfile","s+":"50"},{"a":"7FAF1FC89FFF","b":"7FAF1FC57000","o":"32FFF","s":"gsignal","s+":"CF"},{"a":"7FAF1FC8B42A","b":"7FAF1FC57000","o":"3442A","s":"abort","s+":"16A"},{"a":"55C9189E6C5F","b":"55C917AE3000","o":"F03C5F","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"55C9186CE4B6","b":"55C917AE3000","o":"BEB4B6","s":"_ZN5mongo18WiredTigerKVEngine16haltOplogManagerEv.cold.1904","s+":"18"},{"a":"55C918B0711C","b":"55C917AE3000","o":"102411C","s":"_ZN5mongo21WiredTigerRecordStoreD1Ev","s+":"2FC"},{"a":"55C918B0D68B","b":"55C917AE3000","o":"102A68B","s":"_ZN5mongo29StandardWiredTigerRecordStoreD0Ev","s+":"1B"},{"a":"55C9186CEC5B","b":"55C917AE3000","o":"BEBC5B","s":"_ZN5mongo18WiredTigerKVEngine21getGroupedRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsENS_8KVPrefixE.cold.1921","s+":"57"},{"a":"55C919378A76","b":"55C917AE3000","o":"1895A76","s":"_ZN5mongo17StorageEngineImpl15_initCollectionEPNS_16OperationContextENS_8RecordIdERKNS_15NamespaceStringEb","s+":"316"},{"a":"55C91937A7BD","b":"55C917AE3000","o":"18977BD","s":"_ZN5mongo17StorageEngineImpl11loadCatalogEPNS_16OperationContextE","s+":"90D"},{"a":"55C91937E3D0","b":"55C917AE3000","o":"189B3D0","s":"_ZN5mongo17StorageEngineImplC1EPNS_8KVEngineENS_20StorageEngineOptionsE","s+":"270"},{"a":"55C918AC8005","b":"55C917AE3000","o":"FE5005","s":"_ZNK5mongo12_GLOBAL__N_117WiredTigerFactory6createERKNS_19StorageGlobalParamsEPKNS_21StorageEngineLockFileE","s+":"1A5"},{"a":"55C9193889EE","b":"55C917AE3000","o":"18A59EE","s":"_ZN5mongo23initializeStorageEngineEPNS_14ServiceContextENS_22StorageEngineInitFlagsE","s+":"4CE"},{"a":"55C918A84587","b":"55C917AE3000","o":"FA1587","s":"_ZN5mongo12_GLOBAL__N_114_initAndListenEPNS_14ServiceContextEi.isra.1409","s+":"3F7"},{"a":"55C918A88610","b":"55C917AE3000","o":"FA5610","s":"_ZN5mongo12_GLOBAL__N_111mongoDbMainEiPPcS2_","s+":"650"},{"a":"55C9189F7849","b":"55C917AE3000","o":"F14849","s":"main","s+":"9"},{"a":"7FAF1FC772E1","b":"7FAF1FC57000","o":"202E1","s":"__libc_start_main","s+":"F1"},{"a":"55C918A83A3A","b":"55C917AE3000","o":"FA0A3A","s":"_start","s+":"2A"}],"processInfo":{"mongodbVersion":"4.4.0","gitVersion":"563487e100c4215e2dce98d0af2a6a5a2d67c5cf","compiledModules":[],"uname":{"sysname":"Linux","release":"4.9.0-7-amd64","version":"#1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13)","machine":"x86_64"},"somap":[{"b":"55C917AE3000","elfType":3,"buildId":"D7866CAA7FFAC402345915854064CD98A5B60C27"},{"b":"7FAF1FFF6000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"16D609487BCC4ACBAC29A4EAA2DDA0D2F56211EC"},{"b":"7FAF1FC57000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"775143E680FF0CD4CD51CCE1CE8CA216E635A1D6"}]}}}}
It appears to boil down to the following error message:
Location13111: field not found, expected type date.
src/mongo/bson/bsonelement.h:810
Googling didn't turn up anything useful. I didn't proceed after that but had to revert to v4.2.9. (I wanted to keep the damage to the config secondary and not get the same issue with the shards.)
I'm on Debian 9.13 and I tried both apt to install MongoDB 4.4.0 and directly installing the Debian 9.2 binaries. The error was the same both times.
Any ideas what to do about this one?
We have an application in solaris during specific test case we will generate heap dump which will be written in to the server at specific path during this case we are getting following error in trace file
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /ossrc/upgrade/JREheapdumps/java_pid16092.hprof ...
Dump file is incomplete: I/O error
and in /var/adm/messages we could see
Oct 28 13:00:10 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /ossrc/upgrade]NFS server mashost not
responding; still trying
Oct 28 13:02:53 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /usr/local]NFS server mashost not
responding; still trying
Oct 28 13:04:53 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /etc/opt/ericsson]NFS server mashost not
responding; still trying
Can anyone please help here why we are getting this problem and can any tell us can an application cause this impact on mashost ..????
First things first, check out the NFS service w/ svcbundle and svcs -- when it crashes, run:
# svcs -x nfs/client
on the client, and
# svcs -x nfs/server
on the server. I would expect one or both to be in a "maintenance" state. (You may see it fails to start properly at all). If it is in a maintenance mode, you should see a row marked "Reason:" that says why.
You might see "offline" -- in that case, startd will attempt to reboot the service multiple times and, if it fails after five attempts or hangs indefinitely, places it into "maintenance" state and stops restarting.
Check the logs in
/var/svc/log/<service-name FMRI>.log
There will be one on your client machine under "network-nfs-client:default" (probably, may have a name other than 'default' if it's been changed manually), and one on the server under "network-nfs-server:default"
See what you can glean from those.
svcbundle is all the time taking snapshots as backups of services, so you can try reverting to one of those.
# svcs -s nfs/server:default
svc:/network/nfs/server:default> listsnap
svc:/network/nfs/server:default> revert start [name_of_snapshot]
svc:/network/nfs/server:default> quit
# svcadm refresh nfs/server:default
# svcadm restart nfs/server:default
Make sure to include the ":default" tag, or if you saw a different tag from "svcs nfs/server" include it, that name defines an instance of the service, every running service is an instance.
If the process is failing to boot, you might have to look at the XML manifest under /lib/svc/manifest/network/nfs/ -- inside, you'll see dependencies (and services dependent on this one), then "exec_method"s, which define how the service starts, stops and restarts.
Instead of snapshots, you can can also restore it to default: use svccfg -s <FMRI> delete to clear it, then svcadm refresh <FMRI> and svcadm enable <FMRI>.
If the service is in maintenance state, once you've isolated and fixed the problem, you can manually clear that state by running svcadm clear <FMRI>.
MongoDB is crashing. When I open the mongodb.log file, I get:
$ tail /var/log/mongodb/mongodb.log
Sat Jan 25 03:06:56.153 [initandlisten] connection accepted from 127.0.0.1:58492 #63331 (263 connections now open)
Sat Jan 25 03:07:02.694 out of memory, printing stack and exiting:
0xde05e1 0x6cf37e 0x12129fd 0xc490c3 0xc4404e 0xc44196 0xda4913 0xda53e4 0xe28e69 0x7f5cbaa19e9a 0x7f5cb9d2c3fd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6cf37e]
/usr/bin/mongod(_Znam+0x6d) [0x12129fd]
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc490c3]
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc4404e]
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc44196]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xda4913]
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xda53e4]
/usr/bin/mongod() [0xe28e69]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5cbaa19e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5cb9d2c3fd]
This question sounds similar: MongoDB: out of memory
But his problem was a ulimit issue. My memory settings are already unlimited.
Others had particular issues with .skip() or .limit() given unreasonably large values, but that's not happening here.
Anyone know what might be wrong?
The MongoDB docs recommend having enough swap space for MongoDB, despite it not being a requirement: http://docs.mongodb.org/manual/administration/production-notes/#ProductionNotes-Swap
I'm using Windows Azure hosting, and I discovered that their virtual servers don't have swap space by default:
$ sudo swapon -s
Filename Type Size Used Priority
(Azure defaults to no swap space: Part 1 & Part 2)
So I found a guide to creating a swap file: https://www.digitalocean.com/community/articles/how-to-add-swap-on-ubuntu-12-04
And it solved my problem!
Notes:
The guide says Ubuntu 12.04, but the same steps worked for me on 13.10.
You should use a swap file around half the size of your RAM, not the 512MB used in the guide.
I hope this helps others solve this problem.
I have just updated our Sphinx server from 1.10-beta to 2.0.6-release, and now I have run into some issues with searchd. Previously we were able to run two instances of searchd next to each other by specifying two different config-files, i.e:
searchd --config /etc/sphinx/sphinx.conf
searchd --config /etc/sphinx/sphinx.staging.conf
sphinx.conf listens to 9306:mysql41, and 9312, while sphinx.staging.conf listens to 9307:mysql41 and 9313.
After we updated to 2.0.6 however, a second instance is never started. Or rather.. the output makes it seem like it starts, and a pid-file is created etc. But for some reason only the first searchd instance keeps running, and the second seems to shutdown right away. So while trying to run searchd --config /etc/sphinx/sphinx.conf twice (if that was the first one started) complains that the pid-file is in use, trying to run searchd --config /etc/sphinx/sphinx.staging.conf (if that is the second started instance) "starts" the daemon again and again, only no new process is created..
Note that if I switch these commands around when first creating the process, then sphinx.conf is the instance not really started.
I have checked, and rechecked, that these ports are only used by searchd.
Does anyone have any idea of what I can do/try next? I've installed it from source on ubuntu 10.04 LTS with:
./configure --prefix /etc/sphinx --with-mysql --enable-id64 --with-libstemmer
make -j4 install
Note to self: Check the logs!
RT-indices use binary logs to enable crash recovery. Since my old config files did not specify a path for where these should be stored, both instances of searchd tried to write to the same binary logs. The instance started last was of course not permitted to manipulate these files, and thus exited with a fatal error:
[Fri Nov 2 17:13:32.262 2012] [ 5346] FATAL: failed to lock
'/etc/sphinx/var/data/binlog.lock': 11 'Resource temporarily unavailable'
[Fri Nov 2 17:13:32.264 2012] [ 5345] Child process 5346 has been finished,
exit code 1. Watchdog finishes also. Good bye!
The solution was simple, ensure to specify a binlog_path inside the searchd configuration section of each configuration file:
searchd
{
[...]
binlog_path = /path/to/writable/directory
[...]
}