Ceph MDS crashing constantly : ceph_assert fail ... prepare_new_inode - ceph

We are facing constant crash from the Ceph MDS daemon. We have installed Mimic (v13.2.1).
mds: cephfs-1/1/1 up {0=node2=up:active(laggy or crashed)}
we have followed the DR steps listed at
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
please help in resolving the errors below.
mds crash stacktrace
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f984fc3ee1f]
2: (()+0x284fe7) [0x7f984fc3efe7]
3: (()+0x2087fe) [0x5563e88537fe]
4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xf37) [0x5563e87ce777]
5: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xdb0) [0x5563e87d0bd0]
6: (Server::handle_client_request(MClientRequest*)+0x49e) [0x5563e87d3c0e]
7: (Server::dispatch(Message*)+0x2db) [0x5563e87d789b]
8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5563e87514b4]
9: (MDSRank::_dispatch(Message*, bool)+0x63b) [0x5563e875db5b]
10: (MDSRank::retry_dispatch(Message*)+0x12) [0x5563e875e302]
11: (MDSInternalContextBase::complete(int)+0x67) [0x5563e89afb57]
12: (MDSRank::_advance_queues()+0xd1) [0x5563e875cd51]
13: (MDSRank::ProgressThread::entry()+0x43) [0x5563e875d3e3]
14: (()+0x7e25) [0x7f984d869e25]
15: (clone()+0x6d) [0x7f984c949bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
mds logs: https://pastebin.com/AWGMLRm0

The Ceph 13.2.2 release notes says the following...
The bluestore_cache_* options are no longer needed. They are replaced
by osd_memory_target, defaulting to 4GB. BlueStore will expand and
contract its cache to attempt to stay within this limit. Users
upgrading should note this is a higher default than the previous
bluestore_cache_size of 1GB, so OSDs using BlueStore will use more
memory by default. For more details, see the BlueStore docs.
This caught me by surprise. My osds were going absolutely wild with resident memory usage. The kernel was oom-killing osd processes.
Changing over to the new key and bouncing the osd processes has given me stable performance. mds seems to have stabilized as a result.
Things I have tested successfully...
Write's are stable
Read's of new data is stable (files less than 1 year old)
Read's of old data is stable (files greater than 1 year old)
rm's of new files is stable (files less than 1 year old)
I haven't tested rm of old files (files older than 1 year) yet. This is where I was seeing the worst of the mds instability.

Related

Trouble building u-boot for gumstix overo on yocto "thud" release

Attempting to build a yocto image using the "thud" release, bitbake fails on building the version of u-boot that comes with the meta-gumstix thud branch, which is 2016.03 (which seems antique?).
The error I'm seeing is regarding conflicting types, e.g.
ERROR: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_compile: oe_runmake failed
…
/home/kwisatz/yocto-new/build/tmp/work/overo-poky-linux-gnueabi/u-boot/v2016.03+gitAUTOINC+df61a74e68-r0/recipe-sysroot-native/usr/include/libfdt_env.h:71:30: error: conflicting types for 'fdt64_t'
typedef uint64_t FDT_BITWISE fdt64_t;
Searching the Internet for that, one quickly comes across a range of threads explaining that the problem is the libfdt-dev.h header that comes with the dtc package. Some recommend to blacklist or uninstall the dtc package, but from what I see, it's explicitly required by the u-boot recipe in the gumstix layer for yocto:
DEPENDS += "dtc-native"
See also https://patchwork.openembedded.org/patch/147816/
However, in the thread linked to above, we're talking versions 2018.01 and 2018.03, not 2016.03
The poky layer for thud brings u-boot 2018.07 which builds fine, but with that one, my overo (Airstorm-Y) won't boot anymore:
Booting from nand with DTS...
UBI: attaching mtd1 to ubi0
UBI: scanning is finished
UBI: attached mtd1 (name "mtd=4", size 1013 MiB) to ubi0
UBI: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 512
UBI: VID header offset: 512 (aligned 512), data offset: 2048
UBI: good PEBs: 8108, bad PEBs: 0, corrupted PEBs: 0
UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
UBI: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 1485359018
UBI: available PEBs: 0, total reserved PEBs: 8108, PEBs reserved for bad PEB handling: 160
** File not found /boot/omap3-overo-storm-tobi.dtb **
Loading file '/boot/zImage' to addr 0x82000000 with size 5097744 (0x004dc910)...
Done
Kernel image # 0x82000000 [ 0x000000 - 0x4dc910 ]
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
I'm not entirely sure if this boot problem is related to the u-boot build or to the kernel image that I've built (see my previous thread)?
Any tips on how I could solve this issue? Is there a more recent version of u-boot in a gumstix layer for yocto that I haven't discovered just yet, or do you have any other tips on how I could get a working yocto image for my overo?
P.S. Note that during the build, I'm also seeing these warning, but I don't think there's an actual problem here:
WARNING: u-boot-v2016.03+gitAUTOINC+df61a74e68-r0 do_patch:
Some of the context lines in patches were ignored. This can lead to incorrectly applied patches.
The context lines in the patches can be updated with devtool:
devtool modify <recipe>
devtool finish --force-patch-refresh <recipe> <layer_path>
Then the updated patches and the source tree (in devtool's workspace)
should be reviewed to make sure the patches apply in the correct place
and don't introduce duplicate lines (which can, and does happen
when some of the context is ignored). Further information:
http://lists.openembedded.org/pipermail/openembedded-core/2018-March/148675.html
https://bugzilla.yoctoproject.org/show_bug.cgi?id=10450
Details:
Applying patch 0006-duovero-Read-eeprom-over-i2c.patch
patching file board/gumstix/duovero/duovero.c
patching file include/configs/duovero.h
Hunk #2 succeeded at 50 with fuzz 2 (offset -4 lines).
Now at patch 0006-duovero-Read-eeprom-over-i2c.patch
[…]
If you're looking for a new (development, not stable) image for the Overo I would recommend the Warrior branch
https://github.com/gumstix/yocto-manifest/tree/warrior
It has been tested and confirmed working for the Overo. The Thud branch was added to our repo to add support for the Raspberry Pi CM3+. For an older (stable) image, I would recommend Morty
https://github.com/gumstix/yocto-manifest/tree/morty
Thanks.
At least for me, with the same issue, i just removed
DEPENDS += "dtc-native"
and build completed.

kernel - postgres segfault error 15 in libc-2.19.so

Yesterday we had crash of PostgreSQL 9.5.14 running on Debian 8 (Linux xxxxxx 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux) - Segmentation fault. Database closed all connections and reinitialized itself staying ~1 minute in recovery mode.
PostgreSQL log:
2018-10-xx xx:xx:xx UTC [580-2] LOG: server process (PID 16461) was
terminated by signal 11: Segmentation fault
kern.log:
Oct xx xx:xx:xx xxxxxxxx kernel: [117977.301353] postgres[16461]:
segfault at 7efd3237db90 ip 00007efd3237db90 sp 00007ffd26826678 error
15 in libc-2.19.so[7efd322a2000+1a1000]
According to libc documentation (https://support.novell.com/docs/Tids/Solutions/10100304.html) error code 15 means:
NX_EDEADLK 15 resource deadlock would occur - which does not tell me much.
Could you tell me please if we can do something to avoid this problem in the future? Because this server is of course production one.
All packages are up to date currently. Upgrade of PG is unfortunately not the option. Server runs on Google Compute Engine.
error code 15 means: NX_EDEADLK 15
No, it doesn't mean that. This answer explains how to interpret 15 here.
It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.
if we can do something to avoid this problem in the future?
The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).
To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.

Error in performance test PostgreSQL and GlusterFS

I'm doing performance test with pgbench to evaluate the impacts of using Glusterfs with Postgresql. I've created a gluster replicated volume with 3 bricks/servers:
Volume Name: gv0
Type: Replicate
Volume ID: a7e617ec-c564-4a01-aec9-807e87fcccb3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.112.76.37:/export/sdb1/brick
Brick2: 10.112.76.38:/export/sdb1/brick
Brick3: 10.112.76.39:/export/sdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
After that I've configured postgres to use the the volume gv0. Everything works fine under low stress. However, when the load is increased, the following error occurs:
client 14 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 7 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 5 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 6 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 8 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 0 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 11 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
Any idea of what's causing this?
Gluster does not support "structured data", as stated in the GlusterFS Install Guide:
Gluster does not support so called “structured data”, meaning live, SQL databases. Of course, using Gluster to backup and restore the database would be fine - Gluster is traditionally better when using file sizes at of least 16KB (with a sweet spot around 128KB or so).
My guess would be that Gluster can just about keep up with the replication when the load is small, but struggles when the load is increased above a certain point, possibly leading to split-brain errors.
You can view files in split brain with the command gluster volume heal <volume_name> info split-brain, or gluster volume heal <volume_name> info for all the files that need healing.
It's maybe a bugs in GlusterFS
https://bugzilla.redhat.com/show_bug.cgi?id=1512691
https://bugzilla.redhat.com/show_bug.cgi?id=1518710
Also Kernel before version 4.13 has issue with detecting errors during fsync() and PostgreSQL applied corresponded changes in recent versions https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/
So it makes sense to recheck this issue on recent versions of Kernel, GlusterFS and PostgreSQL.

MongoDB Out of Memory

MongoDB is crashing. When I open the mongodb.log file, I get:
$ tail /var/log/mongodb/mongodb.log
Sat Jan 25 03:06:56.153 [initandlisten] connection accepted from 127.0.0.1:58492 #63331 (263 connections now open)
Sat Jan 25 03:07:02.694 out of memory, printing stack and exiting:
0xde05e1 0x6cf37e 0x12129fd 0xc490c3 0xc4404e 0xc44196 0xda4913 0xda53e4 0xe28e69 0x7f5cbaa19e9a 0x7f5cb9d2c3fd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
/usr/bin/mongod(_ZN5mongo14my_new_handlerEv+0x3e) [0x6cf37e]
/usr/bin/mongod(_Znam+0x6d) [0x12129fd]
/usr/bin/mongod(_ZNK5mongo3Top8cloneMapERNS_9StringMapINS0_14CollectionDataEEE+0x83) [0xc490c3]
/usr/bin/mongod(_ZN5mongo9Snapshots12takeSnapshotEv+0x4e) [0xc4404e]
/usr/bin/mongod(_ZN5mongo14SnapshotThread3runEv+0x66) [0xc44196]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xda4913]
/usr/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xda53e4]
/usr/bin/mongod() [0xe28e69]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5cbaa19e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5cb9d2c3fd]
This question sounds similar: MongoDB: out of memory
But his problem was a ulimit issue. My memory settings are already unlimited.
Others had particular issues with .skip() or .limit() given unreasonably large values, but that's not happening here.
Anyone know what might be wrong?
The MongoDB docs recommend having enough swap space for MongoDB, despite it not being a requirement: http://docs.mongodb.org/manual/administration/production-notes/#ProductionNotes-Swap
I'm using Windows Azure hosting, and I discovered that their virtual servers don't have swap space by default:
$ sudo swapon -s
Filename Type Size Used Priority
(Azure defaults to no swap space: Part 1 & Part 2)
So I found a guide to creating a swap file: https://www.digitalocean.com/community/articles/how-to-add-swap-on-ubuntu-12-04
And it solved my problem!
Notes:
The guide says Ubuntu 12.04, but the same steps worked for me on 13.10.
You should use a swap file around half the size of your RAM, not the 512MB used in the guide.
I hope this helps others solve this problem.

Restore data from Postgres data files

Got system with Postgres broken (a RAID is the reason) , without any backups.
Trying to put data to another comptuter with Postgres (and make however backup).
But always when I set up data directory and run postgres I've got message
GET FATAL: database files are incompatible with server
2012-08-15 19:58:38 GET DETAIL: The database cluster was initialized with BLCKSZ 16777216, but the server was compiled with BLCKSZ 8192.
2012-08-15 19:58:38 GET HINT: It looks like you need to recompile or initdb.
It's very strange number 16777216(2 to power 24 - to big).
However I can't reset default value 8192 when compiling (playing with --with-blocksize= take no effect; BLCKSZ - I can't find it in headers files)
).
Any way to extract data ?
This is environment and circumstances:
harddrive: RAID 1 with 3 SAS disks in array
OS: ubuntu 10.04.04 amd64
Postgres: 9.1 (by apt-get (we change repository links to higher version of Ubuntu))
the system become broken - after some time got
AAC: Host Adapter BLINK LED 0x56
AACO: Adapter kernel panic'd 56
(filesystem or hardware error)
Somehow we got data directory. pg_conroldata shown:
pg_control version number: 903
Catalog version number: 201105231
Database system identifier: 5714530593695276911
Database cluster state: shut down
pg_control last modified: Tue 15 Aug 2012 11:50:50
Latest checkpoint location: 1B595668/2000020
Prior checkpoint location: 0/0
Latest checkpoint's REDO location: 1B595668/2000020
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/4057946
Latest checkpoint's NextOID: 40960
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 670
Latest checkpoint's oldestXID's DB: 1344846103
Latest checkpoint's oldestActiveXID: 0
Time of latest checkpoint: Tue 15 Aug 2012 11:50:50
Minimum recovery ending location: 0/0
Backup start location: 0/0
Current wal_level setting: minimal
Current max_connections setting: 100
Current max_prepared_xacts setting:0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 16777216
Blocks per segment of large relation:131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 2387576020
Maximum size of a TOAST chunk: 0
Date/time type storage: floating-point numbers
Float4 argument passing: by reference
Float8 argument passing: by reference
First I effort to up DB in Ubuntu servers (harddisk - simple serial 2, Ubuntu 10.04 i386, Postgres 9.1) and got the same exception above (with BLCKSZ).
That's why I deployed Ubuntu 10.04 amd64 with english Postgres 9.1 (because got '?' instead of russian symbols in error logs in previous step) in virtual machine
Got the same exception (with BLCKSZ).
Ather that have removed apt-get postgres version and compiled it as described at docs http://www.postgresql.org/docs/9.1/static/installation.html.
Playing withconfigure --with-blocksize=BLOCKSIZE had take no effect - got the same error
Sorry, for the post.
The pg_contol was broken by some manipulations with.
Sow, the cluster was succeful restored by pg_resetxlog with initial data.
A blocksize of 16Mb would be really weird, and since these two values also look completely bogus:
Maximum columns in an index: 2387576020
Maximum size of a TOAST chunk: 0
...you might want to question the integrity of this data before spending time on compiling postgres with a non-standard block size.
If you look at the sizes of files corresponding to relations, are they multiple of 16Mb or 8Kb?
If the database have some gigabytes tables, what appears to be the cut-off size on disk (the size above which postgres split the data into several files)? This should be equal to data block size*Blocks per segment of large relation. On a default install, it's 1Gb.
See here for details on configuring kernel resources. Perhaps the default/current settings for this new OS won't allow the postmaster to start.
Here are details on the meaning and context of the BLCKSZ parameter. Was the system that failed running a 64bit build of PostgreSQL and the new system is a 32bit build? If possible, attempting to obtain version information on the failed system's PostgreSQL could shed light on the problem. Let us know what version, build, and OS were used. Was is a custom build?