After tuning Postgresql, PgBench results are worse - postgresql

I am testing PostgreSQL on an 8gb Ram/4 CPUs/ 80gb SSD cloud server from Digital Ocean. I originally ran PgBench with default settings in the postgresql.conf, and then altered some common settings--shared_buffers, work_mem, maintenance_work_mem, effective_cache_size--to reflect the 8gb of RAM. After running the 2nd set of tests, I noticed that some of my results were actually worse. Any suggestions on why this might be? I am rather new to PgBench and tuning PostgreSQL in general.
Settings:
shared_buffers = 2048mb
work_mem = 68mb
maintenance_work_mem = 1024mb
effective_cache_size = 4096mb
Tests:
pgbench -i -s 100
pgbench -c 16 -j 2 -T 60 -U postgres postgres
pgbench -S -c 16 -j 2 -T 60 -U postgres postgres
pgbench -c 16 -j 4 -T 60 -U postgres postgres
pgbench -S -c 16 -j 4 -T 60 -U postgres postgres
pgbench -c 16 -j 8 -T 60 -U postgres postgres
pgbench -S -c 16 -j 8 -T 60 -U postgres postgres
How effective are these tests? Is this an effective way to employ PgBench? How should I customize tests to properly reflect my data and server instance?

What is mean "worse"? How long time you run pgbench? This test should be executed about 2hour as minimum for realistic values. What version of PostgreSQL do you have?
Attention: You should be very careful about interpretation pgbench result. Probably you should to optimize execution of your application, not pgbench. pgbench is good for hw or sw checking, is bad tool for optimizing of PostgreSQL configuration.
A mentioned configuration variables are basic for configuration and you probably cannot to be wrong there (server must not use a swap actively ever - and these variables ensure it).
A formula that I use:
-- Dedicated server 8GB RAM
shared_buffers = 1/3 .. 1/4 dedicated RAM
effecttive_cache_size = 2/3 dedicated RAM
maintenance_work_mem > higher than the most big table (if possible)
else 1/10 RAM
else max_connection * 1/4 * work_mem
work_mem = precious setting is based on slow query analyse
(first setting about 100MB)
--must be true
max_connection * work_mem * 2 + shared_buffers
+ 1GB (O.S.) + 1GB (filesystem cache) <= RAM size
Usually default values of WAL buffer size and checkpoint segments is too low too. And you can increase it.

Related

pg_dump with -j option and -Z

I am about to backup 120 Gb database. I kept on failing when using PGADMIN backup (because of VPN disconnection after 7 hours running) or SQLMaestro (out of memory issue after 3 hours running).
So I want to run it on the server using pg_dump. The command I want to use is : time pg_dump -j 5 -Fc -Z 1 db_profile_20210714 -f /var/lib/postgresql/backup2/ (I want to measure the time as well, so I put time). And after that I will run pg_dumpall -g
I have 30 cores server and backup drive mounted on NFS. Postgres 12 running on Ubuntu 12.
Questions :
If I use -Z 0, will it undo the default compression of -Fc ? (-Fc is compressed by default)
Does the usage of -j 5 and -Z 1 counter productive to each other ? I read from article that to throttle pg_dump process so that it wont cause I/O spike, one can use -Z between 3 and 5. But what if some one want to utilize the cores and compress at once, is it effective / efficient ?
Thanks
Yes, if you use -Z 0, the custom format dump will be uncompressed. -j and -Z are independent from each other, and you cannot use -j with the custom format. If using compression speeds up the dump or not depends on your bottleneck. If that is the network, compression can help. Otherwise, compression usually makes pg_dump slower.

How to improve pg_basebackup speed

How to improve pg_basebackup speed. I can not find the parallelism option for pg_basebackup. There is no any parallelism option for pg_basebackup? Thank you. I just want to create slave database fast. The database is 5TB and it takes more time for creating Slave database. If there is no any parallel option how can I avoid this time problem?
Command for creating Slave
pg_basebackup -Xs -h 172.31.34.215 -U repuser --checkpoint=fast -D /var/lib/postgresql/14/ter -R --slot=replication_slot -C

How to limit postgresql to use only n CPU cores?

How do we can limit the postgresql to use only n CPU cores and leave the rest for other processes? For example I have 16 cores and I want to dedicate 8 cores to postgresql and reserve the rest 8 cores for other services.
It is not currently possible to confine the PostgreSQL server to a subset of available CPU cores, as there is no configuration or mechanism to achieve it. However, it is possible to leverage NUMA to start Postgres with a limited number of CPU cores available.
In short, if your operating system allows for it, you can do something like:
numactl --hardware
NUMA0_CPUS=ā€0-7ā€
export CPUSET_HOME=$(mount | grep cpuset)
sudo mkdir $CPUSET_HOME/postgres
sudo /bin/bash -c "echo $NUMA0_CPUS >$CPUSET_HOME/postgres/cpus"
sudo /bin/bash -c "echo '0' >$CPUSET_HOME/postgres/mems"
sudo chown postgres $CPUSET_HOME/postgres/tasks
sudo su - $PGSERVERUSER
export CPUSET_HOME="<path where cpusets are mounted>"
echo $$ >$CPUSET_HOME/postgres/tasks
pg_ctl start <usual start parameters>

mongodump for collection larger than ram

I am using a command like this to dump data from a remote machine:
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH"
This fails like so:
Failed: error writing data for collection `somedb.someCollection` to disk: error reading collection: EOF
somedb.someCollection is about 40GB. I don't have the ability to increase RAM to this size.
I have seen two explanations. One is that the console output is too verbose and fills the RAM. This seems absurd, it's only a few kilobytes and it's on the client machine anyway. Rejected (but I am trying it again now with --quiet just to be sure).
The more plausible explanation is that the host fills its RAM with somedb.someCollection data and then fails. The problem is that the 'solution' that I've seen proposed is to increase the RAM to be bigger than the size of the collection.
Really? That can't be right. What's the point of mongodump with that limitation?
The question: is it possible to mongodump a database with a collection that is larger than my RAM size? How?
mongodump Client:
macOS
mongodump --version
mongodump version: 4.0.3
git version: homebrew
Go version: go1.11.4
os: darwin
arch: amd64
compiler: gc
OpenSSL version: OpenSSL 1.0.2r 26 Feb 2019
Server:
built with docker FROM mongo:
Reports: MongoDB server version: 4.0.8
Simply dump your collection slice by slice:
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH" -q '{_id: {$gte: ObjectId("40ad7bce1a3e827d690385ec")}}'
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH" -q '{_id: {$lt: ObjectId("40ad7bce1a3e827d690385ec")}}'
or partitioning your dump by a different query set on _id or some different field. The reported _id is a mere example.
Stennie's answer really works.
The default value of storage.wiredTiger.engineConfig.cacheSizeGB is max((RAM-1GB)/2, 256MB). If your mongodb server is running in a docker container with default configs, and there are other apps running in the host machine, the memory could be full filled when you are dumping a large collection. The same thing can happen if the containers' RAM is limited due to your configs.
You can use docker run --name some-mongo -d mongo --wiredTigerCacheSizeGB 1.5 (the number is based on you situation).
Another possibility is to add the compress flag to the output of mongodump. It helped me to backup a db that hanged at 48% without compressing. So the syntax would be:
mongodump --uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedbauthSource=admin" --gzip --out="$BACKUP_PATH"

Slow query time with Postgres 10 inside Docker vs bare-metal for AWS Linux 2

I've been trying to deploy Postgres within Docker for portability reason, and noticed that query performance as measured by "explain analyze" has been painfully slow compared to bare metal.
For a table with 1.7 million rows, a query on bare metal Postgres takes about 1.2 sec vs 4.8 sec on Dockered Postgres, an increase of 4 times! This comparison is done with the same mounted volume for both bare-metal and Docker (for Docker, I'm using the -v option) The volume is a gp2 volume, mounted through AWS console, 60GB
Couple of things I tried:
Increase shared memory buffer option in postgresql.conf, which has negligible effect
Tried several volume mapping options (delegated, cached, consistent)
Upgrading Docker from 17.06-ce to 17.12-ce
This is all done in AWS Linux 2 instance. At this point Iā€™m hoping to get more suggestions on what to do to improve performance.
The docker run command I use:
docker run -p 5432:5432 --name postgres -v /vol/pgsql/10.0/data:/var/lib/postgresql/data postgres:latest