What can cause a segmentation fault in mongodb - mongodb

We have a mongodb replica set, on of the member crashed with a segmentation fault. What could be causing this issue? We are running version 2.2.2.
Thanks. Here is the log from the crash.
Mon Sep 2 03:37:26 Invalid access at address: 0xfffffd7d00680038 from thread: conn2014070
Mon Sep 2 03:37:26 Got signal: 11 (Segmentation Fault).
Mon Sep 2 03:37:26 Backtrace:
0xb331b8 0x7bd48b 0x7bd695 0xfffffd7fff1d7666 0xfffffd7fff1ca35c 0x9ff980 0x873f13 0x873fcb 0x981331 0x982af2 0x92d2da 0x93183b 0x7cead0 0xb2539a 0xfffffd7ff95f364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo10abruptQuitEi+0x11b [0x7bd48b]
/opt/local/bin/mongod'_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x125 [0x7bd695]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff1d7666]
/lib/amd64/libc.so.1'call_user_handler+0x2a4 [0xfffffd7fff1ca35c]
/opt/local/bin/mongod'_ZNK5mongo6Record5touchEb+0x0 [0x9ff980]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor5yieldEiPNS_6RecordE+0x63 [0x873f13]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor14yieldSometimesENS0_11RecordNeedsEPb+0x6b [0x873fcb]
/opt/local/bin/mongod'_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x9a1 [0x981331]
/opt/local/bin/mongod'_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xa2 [0x982af2]
/opt/local/bin/mongod'_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x27a [0x92d2da]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xe9b [0x93183b]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7ff95f364c] /lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]
Additionally I am seeing some assertion failures before the crash, I am not sure whether they are related. Otherwise nothing else out of the ordinary as far as I can see.
Wed Sep 4 02:19:04 [conn988803] cratefm Assertion failure !e.eoo() src/mongo/db/../bson/bsonobjbuilder.h 131
0xb331b8 0xb01e70 0x7cbe04 0x88e7ec 0x8b5f18 0x8b6b66 0x8b714a 0x978044 0x97ab32 0x931065 0x7cead0 0xb2539a 0xfffffd7fdd1a364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo12verifyFailedEPKcS1_j+0xc0 [0xb01e70]
/opt/local/bin/mongod'0x3cbe04 [0x7cbe04]
/opt/local/bin/mongod'_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x65c [0x88e7ec]
/opt/local/bin/mongod'_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRNS_14BSONObjBuilderEb+0x48 [0x8b5f18]
/opt/local/bin/mongod'_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa26 [0x8b6b66]
/opt/local/bin/mongod'_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x37a [0x8b714a]
/opt/local/bin/mongod'_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x34 [0x978044]
/opt/local/bin/mongod'_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x6c2 [0x97ab32]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6c5 [0x931065]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7fdd1a364c]
/lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]

Make sure your client version >= server version.
Check out https://jira.mongodb.org/browse/SERVER-8105

It was a bug from the 2.6 version when updating arrays on documents with more than 128 BSON elements:
Here is the issue
As they say, it's corrected since the 2.6.1 version, so i encourage you to upgrade as i did, everything works great for me now !

Related

kernel - postgres segfault error 15 in libc-2.19.so

Yesterday we had crash of PostgreSQL 9.5.14 running on Debian 8 (Linux xxxxxx 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux) - Segmentation fault. Database closed all connections and reinitialized itself staying ~1 minute in recovery mode.
PostgreSQL log:
2018-10-xx xx:xx:xx UTC [580-2] LOG: server process (PID 16461) was
terminated by signal 11: Segmentation fault
kern.log:
Oct xx xx:xx:xx xxxxxxxx kernel: [117977.301353] postgres[16461]:
segfault at 7efd3237db90 ip 00007efd3237db90 sp 00007ffd26826678 error
15 in libc-2.19.so[7efd322a2000+1a1000]
According to libc documentation (https://support.novell.com/docs/Tids/Solutions/10100304.html) error code 15 means:
NX_EDEADLK 15 resource deadlock would occur - which does not tell me much.
Could you tell me please if we can do something to avoid this problem in the future? Because this server is of course production one.
All packages are up to date currently. Upgrade of PG is unfortunately not the option. Server runs on Google Compute Engine.
error code 15 means: NX_EDEADLK 15
No, it doesn't mean that. This answer explains how to interpret 15 here.
It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.
if we can do something to avoid this problem in the future?
The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).
To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.

Context Broker crashing with certain update queries

We're running the Context Broker on a CentOS server but it keeps crashing with certain update queries. We've tried version 0.26 and the latest 1.0.0-1 but the result is the same, we've also tried changing the mongoDB version between 3.0.6 and 3.0.7 but no luck. The logs doesn't give us much to go on so that's why we're asking here in SO.
What we're doing is to send an update of an entity of about 1MB in size routed in from a http call via nginx. The context broker crashes (see logs below) but mongodb and other services continue to function normally.
Log file: /var/log/contextBroker/contextBroker.log
terminate called after throwing an instance of 'mongo::MsgAssertionException'
what(): EOO Before end of object
Log file: /var/log/messages
Apr 28 07:15:50 gl abrt[11457]: Saved core dump of pid 11426 (/usr/bin/contextBroker) to /var/spool/abrt/ccpp-2016-04-28-07:15:49-11426 (63606784 bytes)
Apr 28 07:15:50 gl abrtd: Directory 'ccpp-2016-04-28-07:15:49-11426' creation detected
Apr 28 07:15:50 gl abrtd: Package 'contextBroker' isn't signed with proper key
Apr 28 07:15:50 gl abrtd: 'post-create' on '/var/spool/abrt/ccpp-2016-04-28-07:15:49-11426' exited with 1
Apr 28 07:15:50 gl abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2016-04-28-07:15:49-11426'
Output from the contextBroker when it's run in verbose mode:
INFO#14:05:27 logMsg.h[1792]: Starting transaction from 127.0.0.1:51245/v1/updateContext
INFO#14:05:27 connectionOperations.cpp[78]: Database Operation Successful (query: { id.id: "8a55c32500dfad.....06be56709b75b31c1f9beb7d2", id.type: "House", _id.servicePath: /^\/$/ })
terminate called after throwing an instance of 'mongo::MsgAssertionException'
what(): BSONElement: bad type 100
Any ideas about what could be causing this, or where we should continue looking?
This crash is due to a bug detected at Orion. A fix is on the way, so we hope it get merged and ready to be included in next Orion release (Orion 1.2.0).

Today widget doesn't work outside Xcode

I successfully got a Today widget working in my iOS app. It works great when running the widget on the simulator or running on a device from Xcode. But it doesn't update if I install the app on a device and then run the widget (not from Xcode).
Has anyone else faced this issue? Is there a fix to this or is this a known bug? Any workarounds so I can have a bunch of beta testers check the app using Testflight and get the widget to work?
Using Xcode version 6.1.
Adding log statements tells me that all the correct methods are called. But after that a crash log is generated and the logs are very cryptic. This is what the console says
Dec 1 19:23:06 MyDevice ReportCrash[3592] <Error>: task_set_exception_ports(B07, 400, D03, 0, 0) failed with error (4: (os/kern) invalid argument)
Dec 1 19:23:06 MyDevice ReportCrash[3592] <Notice>: ReportCrash acting against PID 3591
Dec 1 19:23:06 MyDevice ReportCrash[3592] <Notice>: Formulating crash report for process MyTest[3591]
Dec 1 19:23:06 MyDevice SpringBoard[48] <Warning>: plugin com.testsaga.MyTest-Today interrupted
And a device log generated during this time lists this statement (specific to my widget - there are other processes in there as well)
Processes Name | <UUID> | CPU Time| rpages| purgeable| recent_max| lifetime_max| fds | [reason] | (state)
MyTest <84554d9818fe3e1fafa848c3fe6a34d5> 1.459 4132 0 - 8076 50 [per-process-limit] (frontmost)
Any insights? Thanks.
**I tried the answer here but it doesn't work for me.

MongoDB Out of Memory

MongoDB is crashing. When I open the mongodb.log file, I get:
$ tail /var/log/mongodb/mongodb.log
Sat Jan 25 03:06:56.153 [initandlisten] connection accepted from 127.0.0.1:58492 #63331 (263 connections now open)
Sat Jan 25 03:07:02.694 out of memory, printing stack and exiting:
0xde05e1 0x6cf37e 0x12129fd 0xc490c3 0xc4404e 0xc44196 0xda4913 0xda53e4 0xe28e69 0x7f5cbaa19e9a 0x7f5cb9d2c3fd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6cf37e]
/usr/bin/mongod(_Znam+0x6d) [0x12129fd]
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc490c3]
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc4404e]
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc44196]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xda4913]
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xda53e4]
/usr/bin/mongod() [0xe28e69]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5cbaa19e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5cb9d2c3fd]
This question sounds similar: MongoDB: out of memory
But his problem was a ulimit issue. My memory settings are already unlimited.
Others had particular issues with .skip() or .limit() given unreasonably large values, but that's not happening here.
Anyone know what might be wrong?
The MongoDB docs recommend having enough swap space for MongoDB, despite it not being a requirement: http://docs.mongodb.org/manual/administration/production-notes/#ProductionNotes-Swap
I'm using Windows Azure hosting, and I discovered that their virtual servers don't have swap space by default:
$ sudo swapon -s
Filename Type Size Used Priority
(Azure defaults to no swap space: Part 1 & Part 2)
So I found a guide to creating a swap file: https://www.digitalocean.com/community/articles/how-to-add-swap-on-ubuntu-12-04
And it solved my problem!
Notes:
The guide says Ubuntu 12.04, but the same steps worked for me on 13.10.
You should use a swap file around half the size of your RAM, not the 512MB used in the guide.
I hope this helps others solve this problem.

GameKit on 4.0 not ready for primetime? Stopping advertising services

Gamekit applications running under 4.0 do not properly handle removing GKSession objects. Running under 3.1.3 or 3.2, if a peer disconnects and the session is cleaned up (as in Apple demos):
[gkSession disconnectFromAllPeers];
[gkSession setAvailable:NO];
[gkSession setDelegate:nil];
[gkSession setDataReceiveHandler:nil withContext:nil];
then the other peers receive state changes and a table view of peers can be updated.
In my application, one peer starts up as a server and the other starts up as a client. The client requests to connect to the server and the client's name appears in the server's list of players. If the server chooses to accept the request, the session connection is established and they can play the game. If however the client quits before the server accepts the request, the client cleans up the session (as above) and the client disappears from the server's peer list in response (when it receives a state change). This works amazingly on 3.1–3.2
When you run the same application running under 4.0 the server and client throw an error and it takes a very long time for peers to receive the state change, and when they do, the application crashes without any errors (even with NSZombieEnabled=YES in build arguments). The server never receives a "state change" message from the client. Instead, the following errors are thrown:
Thu Jul 8 23:27:26 unknown com.apple.mDNSResponder[18] <Notice>: BTLocalDeviceRemoveData: 60 byte key, 18 byte value
Thu Jul 8 23:27:26 unknown MobileBluetooth[29] <Notice>: BTLocalDeviceRemoveData - BT_ERROR_INVALID_HANDLE
Thu Jul 8 23:27:26 unknown com.apple.mDNSResponder[18] <Notice>: Call to BTLocalDeviceRemoveData failed with error 7
Thu Jul 8 23:13:39 unknown mDNSResponder[18] <Error>: external_stop_advertising_service: 18 00Z1Tud0A\\.\\.Tonberry\M-b\M^#\M^Ys\\032iPhone._1htnu3uko0uvsp._udp.local. TXT txtvers=1\M-B\M-&state=A
Thu Jul 8 23:13:39 unknown MobileBluetooth[29] <Notice>: BTLocalDeviceRemoveData - BT_ERROR_INVALID_HANDLE
With what I think is the key error:
Tue Jul 13 21:04:50 Tonberry com.apple.mDNSResponder[21] <Notice>: Call to BTDiscoveryAgentStopScan failed with error 400
Looks to me like the session is not being made unavailable (error in stopping advertising the service). The actual crash:
Thread 3 Crashed:
0 GameKitServices 0x06352f90 gckSessionChangeStateCList + 411
1 GameKitServices 0x0635b49c gckSessionRecvProc + 1474
2 libSystem.B.dylib 0x981c181d _pthread_start + 345
3 libSystem.B.dylib 0x981c16a2 thread_start + 34
I've filed a bug with my full application in progress. The app itself is done and was almost ready to submitted and runs pretty well under 3.1.3/3.2 but with the current state of Gamekit in 4.0 I can no longer submit it. Supremely disappointed and so hoping this bug report helps in the future. If anyone understands this error or what I might be doing wrong I would be supremely grateful.
Please help if you can. I'm about to throw in the towel here on this application and it's so close.
My suggestion would be to use ultimate pre-release builds of 4.1 and to report the "new" problem (either reopen the existing bug as not fixed or create a new one). That's IMHO your best bet to get the problem entirely fixed before the final release or a decent work around from Apple.
For anyone looking for help on this, I've find a workaround for the issues on 4.0.x (4.1 remedies the crashes but not the disconnect times). Just auto-accept everything. When someone requests a GameKit connection with connectToPeer:, just accept it. Don't give the user the option to select it. Disconnecting a peer from an established connection notifies the server immediately. If you leave them in just the "available" state, when they leave the connection, it will crash the server. Connect early and accept often!