kernel - postgres segfault error 15 in libc-2.19.so - postgresql

Yesterday we had crash of PostgreSQL 9.5.14 running on Debian 8 (Linux xxxxxx 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux) - Segmentation fault. Database closed all connections and reinitialized itself staying ~1 minute in recovery mode.
PostgreSQL log:
2018-10-xx xx:xx:xx UTC [580-2] LOG: server process (PID 16461) was
terminated by signal 11: Segmentation fault
kern.log:
Oct xx xx:xx:xx xxxxxxxx kernel: [117977.301353] postgres[16461]:
segfault at 7efd3237db90 ip 00007efd3237db90 sp 00007ffd26826678 error
15 in libc-2.19.so[7efd322a2000+1a1000]
According to libc documentation (https://support.novell.com/docs/Tids/Solutions/10100304.html) error code 15 means:
NX_EDEADLK 15 resource deadlock would occur - which does not tell me much.
Could you tell me please if we can do something to avoid this problem in the future? Because this server is of course production one.
All packages are up to date currently. Upgrade of PG is unfortunately not the option. Server runs on Google Compute Engine.

error code 15 means: NX_EDEADLK 15
No, it doesn't mean that. This answer explains how to interpret 15 here.
It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.
if we can do something to avoid this problem in the future?
The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).
To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.

Related

OrientDB 2.1.9 crashes with OStorageException EOFException when running SQL script in console

I've been using my SQL database initialization script for a while, but it seems that recently the database crashes in the middle of the execution and I don't know why, but here's some details:
I am running OrientDB on Ubuntu 14 Trusty x64 (via Vagrant)
It always seems to crash while the script attempts to create a UNIQUE_HASH_INDEX, but doesn't always crash at the same UNIQUE_HASH_INDEX instruction
The script creates a lot of vertices and edges, but for example, it will crash here (see line with UNIQUE_HASH_INDEX):
CREATE CLASS Channel EXTENDS V;
CREATE PROPERTY Channel.version LONG;
CREATE PROPERTY Channel.channelId STRING;
CREATE INDEX Channel.uq_channelId ON Channel(channelId) UNIQUE_HASH_INDEX;
The database crashes entirely with the following error:
Creating index... Error:
com.orientechnologies.orient.core.exception.OStorageException: Error
on executing command: sql.create INDEX Channel.uq_channelId ON
Channel(channelId) UNIQUE_HASH_INDEX
Error: java.io.EOFException
Looking at the log files, the only hint I get are the last two lines:
2016-01-14 17:17:05:437 INFO Received signal: SIGTERM [OSignalHandler]
2016-01-14 17:17:05:454 INFO Received signal: SIGTERM [OSignalHandler]
How can I resolve this issue, or at least get better hints as to what is making the database crash?
I also test with OrientDB 2.1.6, as I was running the older version initially. Same problem.
Sorry, false alarm -- this is a Vagrant issue, not an OrientDB issue. Running the exact same script on a 32bit instance instead of 64bit solved my problem, and installing the same script on a real 64bit server also works.

MongoDB Out of Memory

MongoDB is crashing. When I open the mongodb.log file, I get:
$ tail /var/log/mongodb/mongodb.log
Sat Jan 25 03:06:56.153 [initandlisten] connection accepted from 127.0.0.1:58492 #63331 (263 connections now open)
Sat Jan 25 03:07:02.694 out of memory, printing stack and exiting:
0xde05e1 0x6cf37e 0x12129fd 0xc490c3 0xc4404e 0xc44196 0xda4913 0xda53e4 0xe28e69 0x7f5cbaa19e9a 0x7f5cb9d2c3fd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6cf37e]
/usr/bin/mongod(_Znam+0x6d) [0x12129fd]
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc490c3]
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc4404e]
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc44196]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xda4913]
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xda53e4]
/usr/bin/mongod() [0xe28e69]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5cbaa19e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5cb9d2c3fd]
This question sounds similar: MongoDB: out of memory
But his problem was a ulimit issue. My memory settings are already unlimited.
Others had particular issues with .skip() or .limit() given unreasonably large values, but that's not happening here.
Anyone know what might be wrong?
The MongoDB docs recommend having enough swap space for MongoDB, despite it not being a requirement: http://docs.mongodb.org/manual/administration/production-notes/#ProductionNotes-Swap
I'm using Windows Azure hosting, and I discovered that their virtual servers don't have swap space by default:
$ sudo swapon -s
Filename Type Size Used Priority
(Azure defaults to no swap space: Part 1 & Part 2)
So I found a guide to creating a swap file: https://www.digitalocean.com/community/articles/how-to-add-swap-on-ubuntu-12-04
And it solved my problem!
Notes:
The guide says Ubuntu 12.04, but the same steps worked for me on 13.10.
You should use a swap file around half the size of your RAM, not the 512MB used in the guide.
I hope this helps others solve this problem.

What can cause a segmentation fault in mongodb

We have a mongodb replica set, on of the member crashed with a segmentation fault. What could be causing this issue? We are running version 2.2.2.
Thanks. Here is the log from the crash.
Mon Sep 2 03:37:26 Invalid access at address: 0xfffffd7d00680038 from thread: conn2014070
Mon Sep 2 03:37:26 Got signal: 11 (Segmentation Fault).
Mon Sep 2 03:37:26 Backtrace:
0xb331b8 0x7bd48b 0x7bd695 0xfffffd7fff1d7666 0xfffffd7fff1ca35c 0x9ff980 0x873f13 0x873fcb 0x981331 0x982af2 0x92d2da 0x93183b 0x7cead0 0xb2539a 0xfffffd7ff95f364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo10abruptQuitEi+0x11b [0x7bd48b]
/opt/local/bin/mongod'_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x125 [0x7bd695]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff1d7666]
/lib/amd64/libc.so.1'call_user_handler+0x2a4 [0xfffffd7fff1ca35c]
/opt/local/bin/mongod'_ZNK5mongo6Record5touchEb+0x0 [0x9ff980]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor5yieldEiPNS_6RecordE+0x63 [0x873f13]
/opt/local/bin/mongod'_ZN5mongo12ClientCursor14yieldSometimesENS0_11RecordNeedsEPb+0x6b [0x873fcb]
/opt/local/bin/mongod'_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x9a1 [0x981331]
/opt/local/bin/mongod'_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xa2 [0x982af2]
/opt/local/bin/mongod'_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x27a [0x92d2da]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xe9b [0x93183b]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7ff95f364c] /lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]
Additionally I am seeing some assertion failures before the crash, I am not sure whether they are related. Otherwise nothing else out of the ordinary as far as I can see.
Wed Sep 4 02:19:04 [conn988803] cratefm Assertion failure !e.eoo() src/mongo/db/../bson/bsonobjbuilder.h 131
0xb331b8 0xb01e70 0x7cbe04 0x88e7ec 0x8b5f18 0x8b6b66 0x8b714a 0x978044 0x97ab32 0x931065 0x7cead0 0xb2539a 0xfffffd7fdd1a364c 0xfffffd7fff1d72d4 0xfffffd7fff1d75a0
/opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
/opt/local/bin/mongod'_ZN5mongo12verifyFailedEPKcS1_j+0xc0 [0xb01e70]
/opt/local/bin/mongod'0x3cbe04 [0x7cbe04]
/opt/local/bin/mongod'_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x65c [0x88e7ec]
/opt/local/bin/mongod'_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRNS_14BSONObjBuilderEb+0x48 [0x8b5f18]
/opt/local/bin/mongod'_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa26 [0x8b6b66]
/opt/local/bin/mongod'_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x37a [0x8b714a]
/opt/local/bin/mongod'_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x34 [0x978044]
/opt/local/bin/mongod'_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x6c2 [0x97ab32]
/opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6c5 [0x931065]
/opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
/opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
/opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7fdd1a364c]
/lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72d4]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75a0]
Make sure your client version >= server version.
Check out https://jira.mongodb.org/browse/SERVER-8105
It was a bug from the 2.6 version when updating arrays on documents with more than 128 BSON elements:
Here is the issue
As they say, it's corrected since the 2.6.1 version, so i encourage you to upgrade as i did, everything works great for me now !

Unable to read crash dump in windbg

I have been getting a stackoverflow exception in my program which may be originating from a thirdparty libary, microsoft.sharepoint.client.runtime.dll.
Using adplus to create the crash dump, I'm facing the problem that I'm struggling to get any information from it when i open it in windbg. This is what I get as a response:
> 0:000> .restart /f
Loading Dump File [C:\symbols\FULLDUMP_FirstChance_epr_Process_Shut_Down_DocumentumMigrator.exe__0234_2011-11-17_15-19-59-426_0d80.dmp]
User Mini Dump File with Full Memory: Only application data is available
Comment: 'FirstChance_epr_Process_Shut_Down'
Symbol search path is: C:\symbols
Executable search path is:
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Machine Name:
Debug session time: Thu Nov 17 15:19:59.000 2011 (UTC + 2:00)
System Uptime: 2 days 2:44:48.177
Process Uptime: 0 days 0:13:05.000
.........................................WARNING: rsaenh overlaps cryptsp
.................WARNING: rasman overlaps apphelp
......
..WARNING: webio overlaps winhttp
.WARNING: credssp overlaps mswsock
.WARNING: IPHLPAPI overlaps mswsock
.WARNING: winnsi overlaps mswsock
............
wow64cpu!CpupSyscallStub+0x9:
00000000`74e42e09 c3 ret
Any ideas as to how i can get more information from the dump, or how to use it to find where my stackoverflow error is occuring?
The problem you are facing is that the process is 32-bit, but you are running on 64-bit, therefore your dump is a 64-bit dump. To make use of the dump you have to run the following commands:
.load wow64exts
.effmach x86
!analyze -v
The last command should give you a meaningful stack trace.
This page provides lots of useful information and method to analyze the problem.
http://www.dumpanalysis.org/blog/index.php/2007/09/11/crash-dump-analysis-patterns-part-26/
You didn't mention if your code is managed or unmanaged. Assuming it is unmanaged. In debugger:
.symfix
.reload
~*kb
Look through the call stack for all threads and identify thread that caused SO. It is easy to identify the thread with SO, because the call stack will be extra long. Switch to that thread using command ~<N>s, where is thread number, dump more of the call stack using command k 200 to dump up to 200 lines of call stack. At the very bottom of the call stack you should be able to see the code that originated the nested loop.
If your code is managed, use SOS extension to dump call stacks.

GTK applications fail to start - xfs restart needed Options

Sorry, not really programming question, but I am not sure where else I could find some help.
After a recent update (Xorg was updated among other things), GTK apps stopped running in my kde4. I have a Debian unstable, updated around 22 April. When I try to run them I get the following error:
ga#grzes:~$ iceweasel
The program 'firefox-bin' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadName (named color or font does not exist)'.
(Details: serial 888 error_code 15 request_code 45 minor_code 0)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the --sync command line
option to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)
ga#grzes:~$ gimp The program 'gimp' received an X Window
System error.
This probably reflects a bug in the program.
The error was 'BadName (named color or font does not exist)'.
(Details: serial 6955 error_code 15 request_code 45 minor_code 0)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the --sync command line
option to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)
(script-fu:4643): LibGimpBase-WARNING **: script-fu: gimp_wire_read():
error
I have to restart the font server manually to have it fixed:
ga#grzes:~$ su
Password:
grzes:/home/ga# /etc/init.d/xfs restart
Stopping X font server: xfs.
Setting up X font server socket directory /tmp/.font-unix...done.
Starting X font server: xfs.
Any ideas what could be wrong? Is it a configuration issue? My system has been updated for the last 7 years, so I can have some old settings.
Debian unstable is very... unstable now, since a release was made a short time ago. Major changes and packages migrations are happening. Xorg (and all X related stuff) being one of the critical packages in that process. My advice is to perform a new update/upgrade in order to catch a new version that may resolve this problem.
It's very frequent that after an update some thing will get broken in inexplicable ways, simply because the developers are uploading new, and not much tested, version of the applications
I finally figured this out: seems like xfs is not compatible with the other components currently and luckily removing it form the system completely solves the problem.