Error in performance test PostgreSQL and GlusterFS - postgresql

I'm doing performance test with pgbench to evaluate the impacts of using Glusterfs with Postgresql. I've created a gluster replicated volume with 3 bricks/servers:
Volume Name: gv0
Type: Replicate
Volume ID: a7e617ec-c564-4a01-aec9-807e87fcccb3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.112.76.37:/export/sdb1/brick
Brick2: 10.112.76.38:/export/sdb1/brick
Brick3: 10.112.76.39:/export/sdb1/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
After that I've configured postgres to use the the volume gv0. Everything works fine under low stress. However, when the load is increased, the following error occurs:
client 14 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 7 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 5 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 6 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 8 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 0 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
client 11 aborted in state 9: ERROR: unexpected data beyond EOF in block 0 of relation base/16384/16503
HINT: This has been seen to occur with buggy kernels; consider updating your system.
Any idea of what's causing this?

Gluster does not support "structured data", as stated in the GlusterFS Install Guide:
Gluster does not support so called “structured data”, meaning live, SQL databases. Of course, using Gluster to backup and restore the database would be fine - Gluster is traditionally better when using file sizes at of least 16KB (with a sweet spot around 128KB or so).
My guess would be that Gluster can just about keep up with the replication when the load is small, but struggles when the load is increased above a certain point, possibly leading to split-brain errors.
You can view files in split brain with the command gluster volume heal <volume_name> info split-brain, or gluster volume heal <volume_name> info for all the files that need healing.

It's maybe a bugs in GlusterFS
https://bugzilla.redhat.com/show_bug.cgi?id=1512691
https://bugzilla.redhat.com/show_bug.cgi?id=1518710
Also Kernel before version 4.13 has issue with detecting errors during fsync() and PostgreSQL applied corresponded changes in recent versions https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/
So it makes sense to recheck this issue on recent versions of Kernel, GlusterFS and PostgreSQL.

Related

Kubernetes DSE Cassandra CommitLogReplayer$CommitLogReplayException

I have installed Cassandra on Kubernetes (9 pods) All the pods are up and running except
for one pod, which shows the below error.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Caused by: org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:111)
... 12 more
ERROR [main] 2021-09-06 06:19:08,990 JVMStabilityInspector.java:251 - JVM state determined to be unstable. Exiting forcefully due to:
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Can someone help me out please
For whatever reason, one of the commit log segments got corrupted on the node.
You can workaround the issue by manually deleting this file on the pod:
/var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log
Interestingly, that commit log segment was created on September 2 (1630582314923) but the log entry you posted was from September 6. This indicates something happened to the pod which resulted in the corrupted file.
You'll need to review the Cassandra logs on the pod (not the pod logs itself) to determine the root cause and address it. Cheers!

kernel - postgres segfault error 15 in libc-2.19.so

Yesterday we had crash of PostgreSQL 9.5.14 running on Debian 8 (Linux xxxxxx 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64 GNU/Linux) - Segmentation fault. Database closed all connections and reinitialized itself staying ~1 minute in recovery mode.
PostgreSQL log:
2018-10-xx xx:xx:xx UTC [580-2] LOG: server process (PID 16461) was
terminated by signal 11: Segmentation fault
kern.log:
Oct xx xx:xx:xx xxxxxxxx kernel: [117977.301353] postgres[16461]:
segfault at 7efd3237db90 ip 00007efd3237db90 sp 00007ffd26826678 error
15 in libc-2.19.so[7efd322a2000+1a1000]
According to libc documentation (https://support.novell.com/docs/Tids/Solutions/10100304.html) error code 15 means:
NX_EDEADLK 15 resource deadlock would occur - which does not tell me much.
Could you tell me please if we can do something to avoid this problem in the future? Because this server is of course production one.
All packages are up to date currently. Upgrade of PG is unfortunately not the option. Server runs on Google Compute Engine.
error code 15 means: NX_EDEADLK 15
No, it doesn't mean that. This answer explains how to interpret 15 here.
It's bits 0, 1, 2, 3 set => protection fault, write access, user mode, use of reserved bit. Most likely your postgress process attempted to write to some wild pointer.
if we can do something to avoid this problem in the future?
The only thing you can do is find the bug and fix it, or upgrade to a release of postgress where that bug is already fixed (and hope that no new ones were introduced).
To understand where the bug might be, you should check whether a core dump was produced (if not, do enable them). If you have the core, use gdb /path/to/postgress /path/to/core, and then where GDB command. That will give you crash stack trace, which may allow you to find similar bug reports.

Ceph MDS crashing constantly : ceph_assert fail ... prepare_new_inode

We are facing constant crash from the Ceph MDS daemon. We have installed Mimic (v13.2.1).
mds: cephfs-1/1/1 up {0=node2=up:active(laggy or crashed)}
we have followed the DR steps listed at
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
please help in resolving the errors below.
mds crash stacktrace
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f984fc3ee1f]
2: (()+0x284fe7) [0x7f984fc3efe7]
3: (()+0x2087fe) [0x5563e88537fe]
4: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0xf37) [0x5563e87ce777]
5: (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xdb0) [0x5563e87d0bd0]
6: (Server::handle_client_request(MClientRequest*)+0x49e) [0x5563e87d3c0e]
7: (Server::dispatch(Message*)+0x2db) [0x5563e87d789b]
8: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5563e87514b4]
9: (MDSRank::_dispatch(Message*, bool)+0x63b) [0x5563e875db5b]
10: (MDSRank::retry_dispatch(Message*)+0x12) [0x5563e875e302]
11: (MDSInternalContextBase::complete(int)+0x67) [0x5563e89afb57]
12: (MDSRank::_advance_queues()+0xd1) [0x5563e875cd51]
13: (MDSRank::ProgressThread::entry()+0x43) [0x5563e875d3e3]
14: (()+0x7e25) [0x7f984d869e25]
15: (clone()+0x6d) [0x7f984c949bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
mds logs: https://pastebin.com/AWGMLRm0
The Ceph 13.2.2 release notes says the following...
The bluestore_cache_* options are no longer needed. They are replaced
by osd_memory_target, defaulting to 4GB. BlueStore will expand and
contract its cache to attempt to stay within this limit. Users
upgrading should note this is a higher default than the previous
bluestore_cache_size of 1GB, so OSDs using BlueStore will use more
memory by default. For more details, see the BlueStore docs.
This caught me by surprise. My osds were going absolutely wild with resident memory usage. The kernel was oom-killing osd processes.
Changing over to the new key and bouncing the osd processes has given me stable performance. mds seems to have stabilized as a result.
Things I have tested successfully...
Write's are stable
Read's of new data is stable (files less than 1 year old)
Read's of old data is stable (files greater than 1 year old)
rm's of new files is stable (files less than 1 year old)
I haven't tested rm of old files (files older than 1 year) yet. This is where I was seeing the worst of the mds instability.

How to fill data upto a size in multiple disk?

I am creating 4 mountpoint disk in Windows OS. I need to copy files up to a threshold value (say 50 GB).
I tried with vdbench. It works fine, but it throws an exception at last.
compratio=4
dedupratio=1
dedupunit=256k
* Host Definition section
hd=default,user=Administator,shell=vdbench,jvms=1
hd=localhost,system=localhost
********************************************************************************
* Storage Definition section
fsd=fsd1,anchor=C:\UnMapTest-Volume1\disk1\,depth=1,width=1,files=1,size=5g
fsd=fsd2,anchor=C:\UnMapTest-Volume2\disk2\,depth=1,width=1,files=1,size=5g
fwd=fwd1,fsd=fsd*,operation=write,xfersize=1m,fileio=sequential,fileselect=random,threads=10
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=1h,interval=1
Below is the exception from vdbench. Due to this my calling script would fail.
05:29:14.287 Message from slave localhost-0:
05:29:14.289 file=C:\UnMapTest-Volume1\disk1\\vdb.1_1.dir\vdb_f0001.file,busy=true
05:29:14.290 Thread: FwgThread write C:\UnMapTest-Volume1\disk1\ rd=rd1 For loops: None
05:29:14.291
05:29:14.292 last_ok_request: Thu Dec 28 05:28:57 PST 2017
05:29:14.292 Duration: 16.92 seconds
05:29:14.293 consecutive_blocks: 10001
05:29:14.294 last_block: FILE_BUSY File busy
05:29:14.294 operation: write
05:29:14.295
05:29:14.296 Do you maybe have more threads running than that you have
05:29:14.296 files and therefore some threads ultimately give up after 10000 tries?
05:29:14.300 *
05:29:14.301 ******************************************************
05:29:14.302 * Slave localhost-0 aborting: Too many thread blocks *
05:29:14.302 ******************************************************
05:29:14.303 *
05:29:21.235
05:29:21.235 Slave localhost-0 prematurely terminated.
05:29:21.235
05:29:21.235 Slave aborted. Abort message received:
05:29:21.235 Too many thread blocks
05:29:21.235
05:29:21.235 Look at file localhost-0.stdout.html for more information.
05:29:21.735
05:29:21.735 Slave localhost-0 prematurely terminated.
05:29:21.735
java.lang.RuntimeException: Slave localhost-0 prematurely terminated.
at Vdb.common.failure(common.java:335)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:198)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
I am using PowerShell in a Windows machine. Even if some other tools like Diskspd is having way to fill data up to some threshold then please provide me.
I found the answer by myself.
I have done this using Diskspd.exe as below
The following command fill 50 GB data in the mentioned disk folder
.\diskspd.exe -c50G -b4K -t2 C:\UnMapTest-Volume1\disk1\testfile1.dat
It is very simple than Vdbench for my requirement.
Caution : But it is not having real data so array side disk size is
not shown up with the size

Restore data from Postgres data files

Got system with Postgres broken (a RAID is the reason) , without any backups.
Trying to put data to another comptuter with Postgres (and make however backup).
But always when I set up data directory and run postgres I've got message
GET FATAL: database files are incompatible with server
2012-08-15 19:58:38 GET DETAIL: The database cluster was initialized with BLCKSZ 16777216, but the server was compiled with BLCKSZ 8192.
2012-08-15 19:58:38 GET HINT: It looks like you need to recompile or initdb.
It's very strange number 16777216(2 to power 24 - to big).
However I can't reset default value 8192 when compiling (playing with --with-blocksize= take no effect; BLCKSZ - I can't find it in headers files)
).
Any way to extract data ?
This is environment and circumstances:
harddrive: RAID 1 with 3 SAS disks in array
OS: ubuntu 10.04.04 amd64
Postgres: 9.1 (by apt-get (we change repository links to higher version of Ubuntu))
the system become broken - after some time got
AAC: Host Adapter BLINK LED 0x56
AACO: Adapter kernel panic'd 56
(filesystem or hardware error)
Somehow we got data directory. pg_conroldata shown:
pg_control version number: 903
Catalog version number: 201105231
Database system identifier: 5714530593695276911
Database cluster state: shut down
pg_control last modified: Tue 15 Aug 2012 11:50:50
Latest checkpoint location: 1B595668/2000020
Prior checkpoint location: 0/0
Latest checkpoint's REDO location: 1B595668/2000020
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/4057946
Latest checkpoint's NextOID: 40960
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 670
Latest checkpoint's oldestXID's DB: 1344846103
Latest checkpoint's oldestActiveXID: 0
Time of latest checkpoint: Tue 15 Aug 2012 11:50:50
Minimum recovery ending location: 0/0
Backup start location: 0/0
Current wal_level setting: minimal
Current max_connections setting: 100
Current max_prepared_xacts setting:0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 16777216
Blocks per segment of large relation:131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 2387576020
Maximum size of a TOAST chunk: 0
Date/time type storage: floating-point numbers
Float4 argument passing: by reference
Float8 argument passing: by reference
First I effort to up DB in Ubuntu servers (harddisk - simple serial 2, Ubuntu 10.04 i386, Postgres 9.1) and got the same exception above (with BLCKSZ).
That's why I deployed Ubuntu 10.04 amd64 with english Postgres 9.1 (because got '?' instead of russian symbols in error logs in previous step) in virtual machine
Got the same exception (with BLCKSZ).
Ather that have removed apt-get postgres version and compiled it as described at docs http://www.postgresql.org/docs/9.1/static/installation.html.
Playing withconfigure --with-blocksize=BLOCKSIZE had take no effect - got the same error
Sorry, for the post.
The pg_contol was broken by some manipulations with.
Sow, the cluster was succeful restored by pg_resetxlog with initial data.
A blocksize of 16Mb would be really weird, and since these two values also look completely bogus:
Maximum columns in an index: 2387576020
Maximum size of a TOAST chunk: 0
...you might want to question the integrity of this data before spending time on compiling postgres with a non-standard block size.
If you look at the sizes of files corresponding to relations, are they multiple of 16Mb or 8Kb?
If the database have some gigabytes tables, what appears to be the cut-off size on disk (the size above which postgres split the data into several files)? This should be equal to data block size*Blocks per segment of large relation. On a default install, it's 1Gb.
See here for details on configuring kernel resources. Perhaps the default/current settings for this new OS won't allow the postmaster to start.
Here are details on the meaning and context of the BLCKSZ parameter. Was the system that failed running a 64bit build of PostgreSQL and the new system is a 32bit build? If possible, attempting to obtain version information on the failed system's PostgreSQL could shed light on the problem. Let us know what version, build, and OS were used. Was is a custom build?