We're running the Context Broker on a CentOS server but it keeps crashing with certain update queries. We've tried version 0.26 and the latest 1.0.0-1 but the result is the same, we've also tried changing the mongoDB version between 3.0.6 and 3.0.7 but no luck. The logs doesn't give us much to go on so that's why we're asking here in SO.
What we're doing is to send an update of an entity of about 1MB in size routed in from a http call via nginx. The context broker crashes (see logs below) but mongodb and other services continue to function normally.
Log file: /var/log/contextBroker/contextBroker.log
terminate called after throwing an instance of 'mongo::MsgAssertionException'
what(): EOO Before end of object
Log file: /var/log/messages
Apr 28 07:15:50 gl abrt[11457]: Saved core dump of pid 11426 (/usr/bin/contextBroker) to /var/spool/abrt/ccpp-2016-04-28-07:15:49-11426 (63606784 bytes)
Apr 28 07:15:50 gl abrtd: Directory 'ccpp-2016-04-28-07:15:49-11426' creation detected
Apr 28 07:15:50 gl abrtd: Package 'contextBroker' isn't signed with proper key
Apr 28 07:15:50 gl abrtd: 'post-create' on '/var/spool/abrt/ccpp-2016-04-28-07:15:49-11426' exited with 1
Apr 28 07:15:50 gl abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2016-04-28-07:15:49-11426'
Output from the contextBroker when it's run in verbose mode:
INFO#14:05:27 logMsg.h[1792]: Starting transaction from 127.0.0.1:51245/v1/updateContext
INFO#14:05:27 connectionOperations.cpp[78]: Database Operation Successful (query: { id.id: "8a55c32500dfad.....06be56709b75b31c1f9beb7d2", id.type: "House", _id.servicePath: /^\/$/ })
terminate called after throwing an instance of 'mongo::MsgAssertionException'
what(): BSONElement: bad type 100
Any ideas about what could be causing this, or where we should continue looking?
This crash is due to a bug detected at Orion. A fix is on the way, so we hope it get merged and ready to be included in next Orion release (Orion 1.2.0).
Related
Yesterday we had crash of PostgreSQL 9.5.14 running on Debian 8 (Linux xxxxxx 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux) - Segmentation fault. Database closed all connections and reinitialized itself staying ~1 minute in recovery mode.
PostgreSQL log:
2018-10-xx xx:xx:xx UTC [580-2] LOG: server process (PID 16461) was
terminated by signal 11: Segmentation fault
kern.log:
Oct xx xx:xx:xx xxxxxxxx kernel: [117977.301353] postgres[16461]:
segfault at 7efd3237db90 ip 00007efd3237db90 sp 00007ffd26826678 error
15 in libc-2.19.so[7efd322a2000+1a1000]
According to libc documentation (https://support.novell.com/docs/Tids/Solutions/10100304.html) error code 15 means:
NX_EDEADLK 15 resource deadlock would occur - which does not tell me much.
Could you tell me please if we can do something to avoid this problem in the future? Because this server is of course production one.
All packages are up to date currently. Upgrade of PG is unfortunately not the option. Server runs on Google Compute Engine.
error code 15 means: NX_EDEADLK 15
No, it doesn't mean that. This answer explains how to interpret 15 here.
It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.
if we can do something to avoid this problem in the future?
The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).
To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.
Could anyone please temme how to know when my AEM Author/Publish instance was last restarted? I know it can be done from logs. But which log and is there any keyword with which I can grep the log?
Yes, you can rely on logs to find this out. error.log and stdout.log look promising. You can grep for any of the sling start events from stdout.log. Logs from stdout.log
during restart:
Loading quickstart properties: default
Loading quickstart properties: instance
Quickstart startup at Thu Sep 13 11:54:21 AEST 2018
UpgradeUtil.handleInstallAndUpgrade has mode RESTART
13.09.2018 11:54:21.762 *INFO * [main] Setting sling.home=D:\Tools\aem\author\crx-quickstart (command line)
13.09.2018 11:54:21.772 *INFO * [main] Starting Apache Sling in D:\Tools\aem\author\crx-quickstart
Quicker way:
There is an admin page which will give you last started time OOTB -> http://<<host>>:<<port>>/system/console/vmstat
We have a mongodb replica set, on of the member crashed with a segmentation fault. What could be causing this issue? We are running version 2.2.2.
Thanks. Here is the log from the crash.
Mon Sep 2 03:37:26 Invalid access at address: 0xfffffd7d00680038 from thread: conn2014070
Mon Sep 2 03:37:26 Got signal: 11 (Segmentation Fault).
Mon Sep 2 03:37:26 Backtrace:
0xb331b8 0x7bd48b 0x7bd695 0xfffffd7fff1d7666 0xfffffd7fff1ca35c 0x9ff980 0x873f13 0x873fcb 0x981331 0x982af2 0x92d2da 0x93183b 0x7cead0 0xb2539a 0xfffffd7ff95f364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo10abruptQuitEi+0x11b [0x7bd48b]
/opt/local/bin/mongod'_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x125 [0x7bd695]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff1d7666]
/lib/amd64/libc.so.1'call_user_handler+0x2a4 [0xfffffd7fff1ca35c]
/opt/local/bin/mongod'_ZNK5mongo6Record5touchEb+0x0 [0x9ff980]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor5yieldEiPNS_6RecordE+0x63 [0x873f13]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor14yieldSometimesENS0_11RecordNeedsEPb+0x6b [0x873fcb]
/opt/local/bin/mongod'_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x9a1 [0x981331]
/opt/local/bin/mongod'_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xa2 [0x982af2]
/opt/local/bin/mongod'_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x27a [0x92d2da]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xe9b [0x93183b]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7ff95f364c] /lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]
Additionally I am seeing some assertion failures before the crash, I am not sure whether they are related. Otherwise nothing else out of the ordinary as far as I can see.
Wed Sep 4 02:19:04 [conn988803] cratefm Assertion failure !e.eoo() src/mongo/db/../bson/bsonobjbuilder.h 131
0xb331b8 0xb01e70 0x7cbe04 0x88e7ec 0x8b5f18 0x8b6b66 0x8b714a 0x978044 0x97ab32 0x931065 0x7cead0 0xb2539a 0xfffffd7fdd1a364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo12verifyFailedEPKcS1_j+0xc0 [0xb01e70]
/opt/local/bin/mongod'0x3cbe04 [0x7cbe04]
/opt/local/bin/mongod'_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x65c [0x88e7ec]
/opt/local/bin/mongod'_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRNS_14BSONObjBuilderEb+0x48 [0x8b5f18]
/opt/local/bin/mongod'_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa26 [0x8b6b66]
/opt/local/bin/mongod'_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x37a [0x8b714a]
/opt/local/bin/mongod'_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x34 [0x978044]
/opt/local/bin/mongod'_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x6c2 [0x97ab32]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6c5 [0x931065]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7fdd1a364c]
/lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]
Make sure your client version >= server version.
Check out https://jira.mongodb.org/browse/SERVER-8105
It was a bug from the 2.6 version when updating arrays on documents with more than 128 BSON elements:
Here is the issue
As they say, it's corrected since the 2.6.1 version, so i encourage you to upgrade as i did, everything works great for me now !
I'm new to web development and I wanted to get started with some RoR (using Locomotive CMS).
One of the things Locomotive asks for is to have Mongodb. I installed using homebrew by following this link http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/
It installs fine but then im not able to run it!
When I type 'mongo' on terminal I get the following output :
"MongoDB shell version: 2.4.3
connecting to: test
Mon May 6 11:12:28.927
JavaScript execution failed:
Error: couldn't connect to server
127.0.0.1:27017 at src/mongo/shell/mongo.js:L112
exception: connect failed"
BACKGROUND TO HELP DEBUGGING ( on Terminal) :
1.When I type in mongod I get the following :
"all output going to: /usr/local/var/log/mongodb/mongo.log"
Ownership of mongo.log :
-rw-r--r-- 1 username admin 22133 May 6 11:13 mongo.log
2.When I input mongod --fork I get the following :
about to fork child process, waiting until server is ready for connections.
forked process: 77566
all output going to: /usr/local/var/log/mongodb/mongo.log
ERROR: child process failed, exited with error number 100
3.Typing mongod --help gives the following warning:
* WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
4.I have a folder called data (which acts as amongodb database, is this where it should be?)in root (PATH : /data) Ownership of data folder :
"drwxr-xr-x 3 username wheel 102 Apr 23 21:38 data"
5.Checking if ports are free: lsof -i :27017. Ive also tried to check for a running mongo process using activity montior and found zilch!
No output
6.Ive also tried : mongo --repair. Dint help!
Ive been stuk on this for a while, I've looked at most responses on stackoverflow and searched around to find a solution to this but nothing has helped so far!
UPDATE:
When I tried to start the mongo shell, I was getting the following l
log message from mongo.log:
5/6/13 1:33:27.616 PM com.apple.launchd:
(org.mongodb.mongod[79133])
open("/private/var/log/mongodb/output.log", ...): Permission denied
So I did a chmod777 for the particular folder and the shell launches!
Although I still get a warning when it launches as:
Server has startup warnings:
Mon May 6 13:33:27.693 [initandlisten]
Mon May 6 13:33:27.693 [initandlisten]
** WARNING: soft rlimits too low.
Number of files is 256, should be at least 1000
Any idea how I can silence these warnings?
To get the information you need to determine the cause of failure you need to look in (and post for us) the output from /usr/local/var/log/mongodb/mongo.log when it is trying to start.
However, the most common reason for the failure is the lack of the default database path - at /data/db. Either create that folder (and don't forget to make sure your user has permission to read/write to it) or specify a different path with the --dbpath option.
UPDATE: as you have since found, bad permissions on the log file can cause the issue, in a similar way to bad permissions on the data path.
In terms of the warning, the information you need is here:
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
It is just that though, a warning - you can run MongoDB without an issue with those limits as long as it is not under heavy load. So, if this is a development environment, unless you plan on load testing, you should be fine
Gamekit applications running under 4.0 do not properly handle removing GKSession objects. Running under 3.1.3 or 3.2, if a peer disconnects and the session is cleaned up (as in Apple demos):
[gkSession disconnectFromAllPeers];
[gkSession setAvailable:NO];
[gkSession setDelegate:nil];
[gkSession setDataReceiveHandler:nil withContext:nil];
then the other peers receive state changes and a table view of peers can be updated.
In my application, one peer starts up as a server and the other starts up as a client. The client requests to connect to the server and the client's name appears in the server's list of players. If the server chooses to accept the request, the session connection is established and they can play the game. If however the client quits before the server accepts the request, the client cleans up the session (as above) and the client disappears from the server's peer list in response (when it receives a state change). This works amazingly on 3.1–3.2
When you run the same application running under 4.0 the server and client throw an error and it takes a very long time for peers to receive the state change, and when they do, the application crashes without any errors (even with NSZombieEnabled=YES in build arguments). The server never receives a "state change" message from the client. Instead, the following errors are thrown:
Thu Jul 8 23:27:26 unknown com.apple.mDNSResponder[18] <Notice>: BTLocalDeviceRemoveData: 60 byte key, 18 byte value
Thu Jul 8 23:27:26 unknown MobileBluetooth[29] <Notice>: BTLocalDeviceRemoveData - BT_ERROR_INVALID_HANDLE
Thu Jul 8 23:27:26 unknown com.apple.mDNSResponder[18] <Notice>: Call to BTLocalDeviceRemoveData failed with error 7
Thu Jul 8 23:13:39 unknown mDNSResponder[18] <Error>: external_stop_advertising_service: 18 00Z1Tud0A\\.\\.Tonberry\M-b\M^#\M^Ys\\032iPhone._1htnu3uko0uvsp._udp.local. TXT txtvers=1\M-B\M-&state=A
Thu Jul 8 23:13:39 unknown MobileBluetooth[29] <Notice>: BTLocalDeviceRemoveData - BT_ERROR_INVALID_HANDLE
With what I think is the key error:
Tue Jul 13 21:04:50 Tonberry com.apple.mDNSResponder[21] <Notice>: Call to BTDiscoveryAgentStopScan failed with error 400
Looks to me like the session is not being made unavailable (error in stopping advertising the service). The actual crash:
Thread 3 Crashed:
0 GameKitServices 0x06352f90 gckSessionChangeStateCList + 411
1 GameKitServices 0x0635b49c gckSessionRecvProc + 1474
2 libSystem.B.dylib 0x981c181d _pthread_start + 345
3 libSystem.B.dylib 0x981c16a2 thread_start + 34
I've filed a bug with my full application in progress. The app itself is done and was almost ready to submitted and runs pretty well under 3.1.3/3.2 but with the current state of Gamekit in 4.0 I can no longer submit it. Supremely disappointed and so hoping this bug report helps in the future. If anyone understands this error or what I might be doing wrong I would be supremely grateful.
Please help if you can. I'm about to throw in the towel here on this application and it's so close.
My suggestion would be to use ultimate pre-release builds of 4.1 and to report the "new" problem (either reopen the existing bug as not fixed or create a new one). That's IMHO your best bet to get the problem entirely fixed before the final release or a decent work around from Apple.
For anyone looking for help on this, I've find a workaround for the issues on 4.0.x (4.1 remedies the crashes but not the disconnect times). Just auto-accept everything. When someone requests a GameKit connection with connectToPeer:, just accept it. Don't give the user the option to select it. Disconnecting a peer from an established connection notifies the server immediately. If you leave them in just the "available" state, when they leave the connection, it will crash the server. Connect early and accept often!