I had a PostgreSQL based web application that run into no connection left problem. so I used pgbouncer to solve this problem. It works very much better with pgbouncer but I'm still having some problems with connection limit...
Right now my postgresql.conf file contains:
max_connections = 100
shared_buffers = 128MB
#temp_buffers = 8MB
#max_prepared_transactions = 0
#work_mem = 1MB
#maintenance_work_mem = 16MB
#max_stack_depth = 2MB
#temp_file_limit = -1
#max_files_per_process = 1000
#shared_preload_libraries = ''
and my pgbouncer.ini file:
pool_mode = session
max_client_conn = 100
default_pool_size = 45
reserve_pool_size = 1
It still encounters some connection limit problems at peak time of site.
Can you help me configure this settings to better configuration.
Thanks in advance.
Related
So I am running a k8s cluster with 3 pod postgres cluster fronted by a 3 pod pgbouncer cluster. Connecting to that is a batch job with multiple parallel workers which stream data into the database via pgbouncer. If I run 10 of these batch job pods everything works smoothly. If I go up an order of magnitude to 100 job pods, a large portion of them fail to connect to the database with the error got error driver: bad connection. Multiple workers run on the same node (5 worker pods per node) So it's only ~26 pods in the k8s cluster.
What's maddening is I'm not seeing any postgres or pgbouncer error/warning logs in Kibana and their pods aren't failing. Also Prometheus logging shows it to be well under the max connections.
Below are the postgres and pgbouncer configs along with the connection code of the workers.
Relevant Connection Code From Worker:
err = backoff.Retry(func() error {
p.connection, err = gorm.Open(postgres.New(postgres.Config{
DSN: p.postgresUrl,
}), &gorm.Config{Logger: newLogger})
return err
}, backoff.NewExponentialBackOff())
if err != nil {
log.Panic(err)
}
Postgres Config:
postgresql:
parameters:
max_connections = 200
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 6990kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6
max_parallel_maintenance_workers = 3
PgBouncer Config:
[databases]
* = host=loot port=5432 auth_user=***
[pgbouncer]
listen_port = 5432
listen_addr = *
auth_type = md5
auth_file = /pgconf/users.txt
auth_query = SELECT username, password from pgbouncer.get_auth($1)
pidfile = /tmp/pgbouncer.pid
logfile = /dev/stdout
admin_users = ***
stats_users = ***
default_pool_size = 20
max_client_conn = 600
max_db_connections = 190
min_pool_size = 0
pool_mode = session
reserve_pool_size = 0
reserve_pool_timeout = 5
query_timeout = 0
ignore_startup_parameters = extra_float_digits
Screenshot of Postgres DB Stats
Things I've tried:
Having the jobs connect directly to the cluster IP of the Pgbouncer service to rule out DNS.
Increasing PgBouncer connection pool
I'm not sure what the issue is here since I don't have any errors from the DB side to fix and a basic error message from the job side. Any help would be appreciated and I can add more context if a key piece is missing.
This ended up being an issue of postgres not actually using the configmap I had set. The map was for 200 connections but the actual DB was still at the default of 100.
Not much to learn here other than make sure to check that the configs you set actually propagate to the actual service.
I have a dedicated PostgreSQL server and a pgBouncer server. all connections are established through pgBouncer.
I tested the system using Apache JMeter and PHP. The results are weird. The throughput with 500 connections are not bad but when I test using more connections it drops.
This is the test result:
The pgbouncer config:
[databases]
maindb = host=212.212.322.323 port=5432 user=myuser dbname=mydb pool_size=1000 pool_mode=transaction
[pgbouncer]
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = trust
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
max_client_conn = 1000
default_pool_size = 20
It is known limit of pgbouncer. It uses only one CPU, and doesn't work well under very high number of connections (for smaller number of connects it is fast and effective). There are new projects, that can be used instead pgbouncer for this purpose - Odyssey or pgagroal.
I started a project that is already in production and we have a few clients already registered and paid for the service (SaaS business).
We use an Ubuntu VPS with Postgresql with default postgresql.conf settings.
Based on PGtune, to optimize the database, I should set these settings in postgresql.conf:
# DB Version: 10
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 1 GB
# CPUs num: 1
# Data Storage: ssd
max_connections = 200
shared_buffers = 256MB
effective_cache_size = 768MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.7
wal_buffers = 7864kB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 655kB
min_wal_size = 1GB
max_wal_size = 4GB
At the moment I have the following settings:
max_connections = 100
shared_buffers = 128MB
And all other settings recommended by PGTune are commented out.
Is it safe to change/adust these settings while the Database is in production and up and running?
You cannot change max_connections and shared_buffers without restarting PostgreSQL, so you will need a short down time.
Raising max_connections above 100 is a bad idea. If you need many client connections, use pgBouncer.
I got many times an error of POSKeyError. I think our setting is not enough parameters of PostgreSQL. Because the system chenged storage from MySQL to PostgreSQL. I got the error many times before the chenging.
Please let me know the specific setting or any points.
Using version:
Plone 4.3.1
RelStorage 1.5.1 with PostgreSQL on RDS, AWS
shared-blob-dir true (stored on the filesystem)
Plone Quick Upload 1.8.2
Here are some PostgreSQL tune-ups within postgresql.conf:
# shared_buffers and effective_cache_size should be 30%-50%
# of your machine free memory
shared_buffers = 3GB
effective_cache_size = 2GB
checkpoint_segments = 64
checkpoint_timeout = 1h
max_locks_per_transaction = 512
max_pred_locks_per_transaction = 512
# If you know what you're doing you can uncomment and adjust the following values
#cpu_tuple_cost = 0.0030
#cpu_index_tuple_cost = 0.0001
#cpu_operator_cost = 0.0005
And here are they explained by Jens W. Klein:
most important: shared_buffers = 3GB (set it to 30%-50% of your
machine free memory)
checkpoint_segments = 64,
checkpoint_timeout = 1h (decreases logging overhead)
max_locks_per_transaction = 512,
max_pred_locks_per_transaction = 512 (relstorage needs lots of them)
effective_cache_size = 4GB (adjust to ~50% of your memory)
just for import you could disable fsync in the config, then it should be really fast, but don't switch off the machine
CPU tweaks. We didn't touch the default values for these, but if you
know what you're doing, go for it. Bellow are some recommended
values:
cpu_tuple_cost = 0.0030,
cpu_index_tuple_cost = 0.001,
cpu_operator_cost = 0.0005 (query planning optimizations, the defaults are some year old, so current cpus are faster, these are better estimates, but i don't know how to get here "real" values)
You should also read https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
And here is our buildout.cfg:
[instance1]
recipe = plone.recipe.zope2instance
rel-storage =
type postgresql
host 10.11.12.13
dbname datafs
user zope
password secret
blob-dir /var/sharedblobstorage/blobs
blob-cache-size 350MB
poll-interval 0
cache-servers 10.11.12.14:11211
cache-prefix datafs
My Server has following resources :
[postgres#srv2813 ~]$ free -m
total used free shared buffers cached
Mem: 15929 15118 810 142 12 219
-/+ buffers/cache: 14885 1043
Swap: 8031 2007 6024
[postgres#srv2813 ~]$ cat /proc/cpuinfo | grep processor | wc -l
8
[root#srv2813 postgres]# sysctl kernel.shmall
kernel.shmall = 4194304
[root#srv2813 postgres]# sysctl kernel.shmmax
kernel.shmmax = 17179869184
and My PostgreSQL conf :
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 12GB
work_mem = 32MB
wal_buffers = 16MB
shared_buffers = 3840MB
max_connections = 500
fsync = off
temp_buffers=32MB
But its getting "too many connection" error. The nginx_status page of the webserver shows around 500 active connections when this happens. The server hosts an api severver, so every "http request" invariably initiate a database "read". Its not a "write" heavy thing, but very "read" heavy.
Its possible that i maxed out our sever, but still i expected a little more from a 16G/8 core box considering the "read only" nature of the application. Can i push the PostgreSQL in any other possible direction?
PostgreSQL is process based vs thread based so it does not generally work well with a lot of connections.
I would look at using something like PgBouncer. PgBouncer is a lightweight connection pooler for PostgreSQL.