I've got a MongoDB 4.4.1 replica set, running on Ubuntu 20.04 with all the default (proper) ulimits set. Permissions on the data directory are also proper. It crashes about once a day under load.
The error message in the log doesn't immediately explain the issue.
{"t":{"$date":"2020-10-05T22:25:46.854+00:00"},"s":"F", "c":"-", "id":23083, "ctx":"conn503635","msg":"Invariant failure","attr":{"expr":"ret","error":"UnknownError: -31803: WT_NOTFOUND: item not found","file":"src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp","line":1600}}
{"t":{"$date":"2020-10-05T22:25:46.856+00:00"},"s":"F", "c":"-", "id":23084, "ctx":"conn503635","msg":"\n\n***aborting after invariant() failure\n\n"}
Does anyone have an idea what's going wrong or how to fix it?
Related
I had recently a bug with one collection among my databases. Tried to fix it by myself and then I ran dbmongo with --repair and it never worked with the most recent mongo (6.0 version) . One of the error i get is the following
"t":{"$date":"2022-10-23T10:34:56.814+03:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1666510496:814423][245805:0x7f99fc399c80], file:collection-591-2337122025107551252.wt, WT_CURSOR.next: __wt_block_read_off, 296: collection-591-2337122025107551252.wt: potential hardware corruption, read checksum error for 16384B block at offset 4537556992: calculated block checksum doesn't match expected checksum"}}
I tried to fix this error by removing the collection.
Then it produces another error "aborting after invariant() failure":
{"t":{"$date":"2022-10-23T11:13:48.644+03:00"},"s":"F", "c":"-", "id":23081, "ctx":"initandlisten","msg":"Invariant failure","attr":{"expr":"buildUUID","msg":"collection: mydatabase.birds_collection:_id_","file":"src/mongo/db/catalog/index_catalog_impl.cpp","line":169}}
{"t":{"$date":"2022-10-23T11:13:48.644+03:00"},"s":"F", "c":"-", "id":23082, "ctx":"initandlisten","msg":"\n\n***aborting after invariant() failure\n\n"}
{"t":{"$date":"2022-10-23T11:13:48.646+03:00"},"s":"F", "c":"CONTROL", "id":6384300, "ctx":"initandlisten","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
Then no matter how many time i run it i always get this error.
Everything i read online consists in running the --repair command
I would be glad is there any way to make it work again even with some broken collections. If there isn't any way how can I backup my collections ?
I tried mongodump but I cannot use since mongo will not start because of this issues. Is there another manual way to backup my file of collections ?
I tried to copy paste them in a new folder but they don't appear in mongo. I just want to save as many collections as possible. I don't have a good backup to use
I have a simple query loop that gets a MongoCursorNotFoundException after processing about 44,000 of 96,945 documents in around 93 minutes.
MongoIterable<MasterDocument> query = masterCollection.find().noCursorTimeout(true);
for (MasterDocument masterDocument : query) { ... do some stuff ... }
The "do some stuff" part takes a while, which is why the entire loop takes so long.
My problem is that I get this exception after handling maybe half of the documents in the collection.
I am running both the client application and the mongod server locally on my Windows 10 laptop, accessing the server via localhost.
The server log shows lots of messages like this:
{"t":{"$date":"2021-01-04T20:21:35.510-08:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn27","msg":"Slow query","attr":{"type":"command","ns":"master_database.MasterCollection","command":{"find":"MasterCollection","filter":{"hashCode":1753339282},"$db":"master_database","lsid":{"id":{"$uuid":"6a252f51-2c6e-4c01-ae03-1a80aab109e0"}}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":96944,"cursorExhausted":true,"numYields":96,"nreturned":0,"queryHash":"DBC59907","planCacheKey":"DBC59907","reslen":121,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":97}},"Global":{"acquireCount":{"r":97}},"Database":{"acquireCount":{"r":97}},"Collection":{"acquireCount":{"r":97}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":147}}
The last of these messages is followed by:
{"t":{"$date":"2021-01-04T20:21:35.521-08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn27","msg":"Connection ended","attr":{"remote":"127.0.0.1:58990","connectionId":27,"connectionCount":14}}
{"t":{"$date":"2021-01-04T20:21:35.522-08:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn26","msg":"Connection ended","attr":{"remote":"127.0.0.1:58989","connectionId":26,"connectionCount":13}}
{"t":{"$date":"2021-01-04T20:21:35.922-08:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn25","msg":"Interrupted operation as its client disconnected","attr":{"opId":310196}}
I have tried:
Using "noCursorTimeout(true)" on the query cursor (as shown above)
Starting the server with "mongod --setParameter localLogicalSessionTimeoutMinutes=240". This last seems to have caused additional log messages that say "error":"Location13111: wrong type for field (expireAfterSeconds) long != int"
I am using mongod 4.4 and the latest mongo java api.
You may need to increase the default cursor idle timeout to bigger value in all shards and mongos:
check the parameter(default is 10 min = 600000 ms ):
use admin
db.runCommand({getParameter:1, cursorTimeoutMillis: 1})
and update to bigger value:
use admin
db.runCommand({setParameter:1, cursorTimeoutMillis: 600000000 })
also the COLSCAN in your logs indicate that you dont use indexes in your query , maybe you need to create one on "hashCode" ...
Thanks for the response.
It turned out that my application ran to completion once I started mongod with "--setParameter localLogicalSessionTimeoutMinutes=240, despite the error message that I saw in the console log.
You are absolutely right that I should have an index on "hashCode". (I had one before but forgot to recreate it after recreating the collection.)
I'm newbie to mongodb, my mongodb database was getting dropped automatically sometimes. I could not find the cause. We have not issued any drop command . but the database is missing and log file contains,
MongoDB - Version 2.4.6
dropDatabase test starting
removeJournalFiles
dropDatabase test finished
Assertion: 13347:local.oplog.rs missing. did you drop it? if so restart server
0x9877f6 0x94bfaa 0x7998e8 0x794d8c 0x6b909e 0x6b95bc 0x7a1549 0x7a4b03 0x75a530 0x5a08eb 0x973c82 0x33b8e07f33 0x33b86f4ded
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0x9877f6]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9a) [0x94bfaa]
/usr/bin/mongod() [0x7998e8]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pbb+0x3c) [0x794d8c]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xb6e) [0x6b909e]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x22c) [0x6b95bc]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x29) [0x7a1549]
/usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x6c3) [0x7a4b03]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6f0) [0x75a530]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xbb) [0x5a08eb]
/usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x432) [0x973c82]
/lib64/libpthread.so.0() [0x33b8e07f33]
/lib64/libc.so.6(clone+0x6d) [0x33b86f4ded]
Please anyone explain me the cause of automatic DB dropping .
I am upgrading my cluster to wiredTiger using this site: https://docs.mongodb.org/manual/tutorial/change-replica-set-wiredtiger/
I have been having the following issue:
Environment details:
MongoDB 3.0.9 in a sharded cluster on Red Hat Enterprise Linux Server release 6.2 (Santiago). I have 4 shards, each one is a replica set with 3 members. I just recently upgraded all binaries from 2.4 to 3.0.9. Every server has updated binaries, I tried converting each replica set to wired tiger storage engine, but I was getting the following error when upgrading the secondary on one member server (shard 1):
2016-02-09T12:36:39.366-0500 F REPL [rsSync] replication oplog stream went back in time. previous timestamp: 56b9c217:ab newest timestamp: 56b9b429:60. Op being applied: { ts: Timestamp 1455010857000|96, h: 2267356763748731326, v: 2, op: "d", ns: "General.Tickets", fromMigrate: true, b: true, o: { _id: ObjectId('566aec7bdfd4b700e73d64db') }
2016-02-09T12:36:39.366-0500 I - [rsSync] Fatal Assertion 18905
2016-02-09T12:36:39.366-0500 I - [rsSync]
***aborting after fassert() failure
This is an open bug with replication: https://jira.mongodb.org/browse/SERVER-17081
Every other part of the cluster, the upgrade went flawlessly, however, now I am stuck with only the primary and one secondary on shard 1. I've attempted resyncing the broken member using MMAPv1 and Wired Tiger, but I continually get the error above. Because of this, one shard is stuck using MMAPV1, and that shard happens to have most of the data (700 GB).
I have also tried rebooting, re-installing the binaries, to no avail.
Any help is appreciated.
I solved this by dropping our giant collection. The rssync must have been hitting some limit since the giant collection had 4.5 billion documents.
My mongo shell is starting without any error
>use mydb is also working properly (here db name is mydb)
but when i am giving show collections command, it is showing following error.
>show collections
Wed Oct 15 17:38:30 uncaught exception: error: {
"$err" : "file /var/lib/mongodb/mydb.6 open/create failed in createPrivateMap (look in log for more information)",
"code" : 13636
}
Here is the error log
17:38:22 [initandlisten] connection accepted from 127.0.0.1:53178 #1
17:38:30 [conn1] ERROR: mmap private failed with out of memory. You are using a 32-bit build and probably need to upgrade to 64
17:38:30 [conn1] Assertion: 13636:file /var/lib/mongodb/mydb.6 open/create failed in createPrivateMap (look in log for more information)
17:38:30 [conn1] assertion 13636 file /var/lib/mongodb/mydb.6 open/create failed in createPrivateMap (look in log for more information) ns:mydb.system.namespaces query:{}
17:39:01 [clientcursormon] mem (MB) res:2 virt:90 mapped:0
Based on one solution given for another stackoverflow question ,couldn't connect to server 127.0.0.1 shell/mongo.js , i tried same step in my case and problem was solved for time being, but the main issue is whenever i shutdown my machine and restart again i get the same error and i have to repeat same steps (as given in above link) to make mongo shell working and it ultimately lead to data loss within collections. Can anyone suggest what could be the reason , is there some problem with my mongodb installation? Please let me know if anyone had similar issue and successfully resolved it . Thanks
I think there's two possible sources of the problem:
your computer doesn't have enough RAM available
as the log message says, you are using a 32-bit build of MongoDB and should use a 64-bit build, because you have two much data to memory map with 32-bit (> about 2.5 GB)