is it correct parameters for pgbouncer.ini and postgresql.conf? - postgresql

I have pgbouncer.ini file with the below configuration
[databases]
test_db = host=localhost port=5432 dbname=test_db
[pgbouncer]
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid
listen_addr = 0.0.0.0
listen_port = 5433
unix_socket_dir = /var/run/postgresql
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
#pool_mode = transaction
pool_mode = session
server_reset_query = RESET ALL;
ignore_startup_parameters = extra_float_digits
max_client_conn = 25000
autodb_idle_timeout = 3600
default_pool_size = 250
max_db_connections = 250
max_user_connections = 250
and I have in my postgresql.conf file
max_connections = 2000
does it effect badly on the performance ? because of max_connections in my postgresql.conf ? or it doesn't mean anything and already the connection handled by the pgbouncer ?
one more question. in pgpouncer configuration, does it right listen_addr = 0.0.0.0 ? or should to be listen_addr = * ?
Is it better to set default_pool_size on PgBouncer equal to the number of CPU cores available on this server?
Shall all of default_pool_size, max_db_connections and max_user_connections to be set with the same value ?

So the idea of using pgbouncer is to pool connections when you can't afford to have a higher number of max_connections in PG itself.
NOTE: Please DO NOT set max_connections to a number like 2000 just like that.
Let's start with an example, if you have a connection limit of 20 and then your app or organization wants to have a 1000 connections at a given time, that is where pooler comes into picture and in this specific case you want the 20 connections to pool that 1000 coming in from the application.
To understand how it actually works let's take a step back and understand what happens when you do not have a connection pooler and only rely on PG config setting for the max connections which in our case is 20.
So when a connection comes in from a client\application etc. the main process of postgresql(PID, i.e. parent ID) spawns a child for that. So each new connection spawns a child process under the main postgres process, like so:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24379 postgres 20 0 346m 148m 122m R 61.7 7.4 0:46.36 postgres: sysbench sysbench ::1(40120)
24381 postgres 20 0 346m 143m 119m R 62.7 7.1 0:46.14 postgres: sysbench sysbench ::1(40124)
24380 postgres 20 0 338m 137m 121m R 57.7 6.8 0:46.04 postgres: sysbench sysbench ::1(40122)
24382 postgres 20 0 338m 129m 115m R 57.4 6.5 0:46.09 postgres: sysbench sysbench ::1(40126)
So now once a connection request is sent, it is received by the POSTMASTER process and creates a child process at OS level under the main parent process. This connection then has a life span of "unlimited" unless close by the application or you have a time out set for idle connections in postgresql.
Now here comes the situation where it can be a very costly affair to manage the connections with a given compute, if they exceed a certain limit. Meaning n number of connections when served have a given compute cost and after some time the OS won't be able to handle a situation with HUGE connections and will in turn cause contentions at different compute level(i.e. Memory, CPU, I/O).
What if you can use the presently spawned child processes(backends) if they are not doing any work. You will save time on getting the child process(backends) and the additional cost as well(this can be different at times). This is where the pool of connections that are always open help to serve different client requests comes in and is also called pooling.
So basically now you have only n connections available but the pooler can manage n+i number of connections to serve the client requests.
This where pg-bouncer helps to reuse the connections. It can be configured with 3 types of pooling i.e Session pooling, Statement pooling and Transaction pooling. Basically bouncer returns the connection back to the pool once it has done, statement level work or transaction level work etc. Only during session pooling it keeps the connections unless it disconnects.
So basically lower down the number of connections at PG conf file level and tune all settings in the bouncer.ini.
To answer the second part:
one more question. in pgpouncer configuration, does it right listen_addr = 0.0.0.0 ? or should to be listen_addr = * ?
It depends if you have a standalone deployment, server etc.
basically if its on the server itself and you want it to allow connections from everywhere(incoming) use "*" if you want only the local network to be allowed use "127.0.0.0".
For the rest of your questions check this link: pgbouncer docs
I have tried to share a little of what I know, feel free to ask away if anything was unclear or or correct if it was incorrectly mentioned.

Related

Why do some of my kubernetes nodes fail to connect to my postgres cluster while others succeed?

So I am running a k8s cluster with 3 pod postgres cluster fronted by a 3 pod pgbouncer cluster. Connecting to that is a batch job with multiple parallel workers which stream data into the database via pgbouncer. If I run 10 of these batch job pods everything works smoothly. If I go up an order of magnitude to 100 job pods, a large portion of them fail to connect to the database with the error got error driver: bad connection. Multiple workers run on the same node (5 worker pods per node) So it's only ~26 pods in the k8s cluster.
What's maddening is I'm not seeing any postgres or pgbouncer error/warning logs in Kibana and their pods aren't failing. Also Prometheus logging shows it to be well under the max connections.
Below are the postgres and pgbouncer configs along with the connection code of the workers.
Relevant Connection Code From Worker:
err = backoff.Retry(func() error {
p.connection, err = gorm.Open(postgres.New(postgres.Config{
DSN: p.postgresUrl,
}), &gorm.Config{Logger: newLogger})
return err
}, backoff.NewExponentialBackOff())
if err != nil {
log.Panic(err)
}
Postgres Config:
postgresql:
parameters:
max_connections = 200
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 6990kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6
max_parallel_maintenance_workers = 3
PgBouncer Config:
[databases]
* = host=loot port=5432 auth_user=***
[pgbouncer]
listen_port = 5432
listen_addr = *
auth_type = md5
auth_file = /pgconf/users.txt
auth_query = SELECT username, password from pgbouncer.get_auth($1)
pidfile = /tmp/pgbouncer.pid
logfile = /dev/stdout
admin_users = ***
stats_users = ***
default_pool_size = 20
max_client_conn = 600
max_db_connections = 190
min_pool_size = 0
pool_mode = session
reserve_pool_size = 0
reserve_pool_timeout = 5
query_timeout = 0
ignore_startup_parameters = extra_float_digits
Screenshot of Postgres DB Stats
Things I've tried:
Having the jobs connect directly to the cluster IP of the Pgbouncer service to rule out DNS.
Increasing PgBouncer connection pool
I'm not sure what the issue is here since I don't have any errors from the DB side to fix and a basic error message from the job side. Any help would be appreciated and I can add more context if a key piece is missing.
This ended up being an issue of postgres not actually using the configmap I had set. The map was for 200 connections but the actual DB was still at the default of 100.
Not much to learn here other than make sure to check that the configs you set actually propagate to the actual service.

PgBouncer performance drops when connection count increases

I have a dedicated PostgreSQL server and a pgBouncer server. all connections are established through pgBouncer.
I tested the system using Apache JMeter and PHP. The results are weird. The throughput with 500 connections are not bad but when I test using more connections it drops.
This is the test result:
The pgbouncer config:
[databases]
maindb = host=212.212.322.323 port=5432 user=myuser dbname=mydb pool_size=1000 pool_mode=transaction
[pgbouncer]
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = trust
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
max_client_conn = 1000
default_pool_size = 20
It is known limit of pgbouncer. It uses only one CPU, and doesn't work well under very high number of connections (for smaller number of connects it is fast and effective). There are new projects, that can be used instead pgbouncer for this purpose - Odyssey or pgagroal.

psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity

I want to create a postgres-xl cluster. The cluster includes 5 nodes, 1 GTM, 2 Coordinator and 2 Datanodes. The following are the details of nodes
GTM:
hostname=localhost
nodename=gtm
IP=127.0.0.1
port=20001
Coordinator1:
hostname=localhost
nodename=coord1
IP=127.0.0.1
pooler_port=30011,port=30001
Coordinator2:
hostname=host2
nodename=coord2
IP=10.4.6.36
pooler_port=30012,port=30002
Datanode1:
hostname=localhost
nodename=dn1
IP=127.0.0.1
pooler_port=40011, port=40001
Datanode2:
hostname=host2
nodename=dn2
IP=10.4.6.36
pooler_port=40012, port=40002
I have installed pgxc_ctl and added /usr/local/pgsql/bin to PATH for postgres. I have Configured ssh authentication to avoid inputting the password for pgxc_ctl. I have edited postgresql.conf and pg_hba.conf on both nodes.
Then I built the cluster as follows:
$ pgxc_ctl
PGXC$ add gtm master gtm localhost 20001 $dataDirRoot/gtm
PGXC$ add coordinator master coord1 localhost 30001 30011
$dataDirRoot/coord_master.1 none none
PGXC$ add coordinator master coord2 10.4.6.36 30002 30012
$dataDirRoot/coord_master.2 none none
after adding coord2, i got the following
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
PGXC$ add datanode master dn1 localhost 40001 40011
$dataDirRoot/dn_master.1 none none none
PGXC$ add datanode master dn2 10.4.6.36 40002 40012
$dataDirRoot/dn_master.2 none none none
after adding dn2, I got the following error
ERROR: Failed to get pooled connections
HINT: This may happen because one or more nodes are currently unreachable, either because of node or network failure.
It's also possible that the target node may have hit the connection limit or the pooler is configured with low connections.
Please check if all nodes are running fine and also review max_connections and max_pool_size configuration parameters
But when I monitor all the nodes, it shows
PGXC$ monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
I could not connect to coord2 by running
psql -h 10.4.6.36 -p 30002 -U user -d postgres
It shows
psql: FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity
But I could connect to the coord1 by running
psql -p 30001 -U user -d postgres
I could ping host2 from my localhost without the password.
I need to resolve the above errors. Any help?
Adding the configuraion:
pgxcInstallDir=$HOME/pgxc
pgxcOwner=$USER
pgxcUser=$pgxcOwner
tmpDir=/tmp
localTmpDir=$tmpDir
configBackup=n
configBackupHost=pgxc-linker
configBackupDir=$HOME/pgxc
configBackupFile=pgxc_ctl.bak
dataDirRoot=$HOME/DATA/pgxl/nodes
#---- Coordinators ----------------------------------------------------------------------------------------------------
coordMasterDir=$dataDirRoot/coord_master
coordSlaveDir=$HOME/coord_slave
coordArchLogDir=$HOME/coord_archlog
coordExtraConfig=coordExtraConfig
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_pool_size=300
max_connections=200
hot_standby = off
EOF
#---- Datanodes -------------------------------------------------------------------------------------------------------
datanodeMasterDir=$dataDirRoot/dn_master
datanodeSlaveDir=$dataDirRoot/dn_slave
datanodeArchLogDir=$dataDirRoot/datanode_archlog
datanodeExtraConfig=datanodeExtraConfig
cat > $datanodeExtraConfig <<EOF
#================================================
# Added to all the datanode postgresql.conf
# Original: $datanodeExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_pool_size=300
max_connections=200
hot_standby = off
EOF
#---- GTM ------------------------------------------------------------------------------------
gtmName=gtm
gtmMasterServer=localhost
gtmMasterPort=20001
gtmMasterDir=$dataDirRoot/gtm
coordNames=( coord1 coord2 )
coordMasterServers=( localhost 10.4.6.36 )
coordPorts=( 30001 30002 )
poolerPorts=( 30011 30012 )
coordMasterDirs=( $dataDirRoot/coord_master.1 $dataDirRoot/coord_master.2 )
coordMaxWALSenders=( 5 5 )
coordSlave=n
coordSlaveServers=( none none )
coordSlavePorts=( none none )
coordSlavePoolerPorts=( none none )
coordSlaveDirs=( none none )
coordArchLogDirs=( none none )
coordSpecificExtraConfig=( coordExtraConfig coordExtraConfig )
coordSpecificExtraPgHba=( none none )
datanodeNames=( dn1 dn2 )
datanodeMasterServers=( localhost 10.4.6.36 )
datanodePorts=( 40001 40002 )
datanodePoolerPorts=( 40011 40012 )
datanodeMasterDirs=( $dataDirRoot/dn_master.1 $dataDirRoot/dn_master.2 )
datanodeMasterWALDirs=( none none )
datanodeMaxWALSenders=( 5 5 )
datanodeSpecificExtraConfig=( datanodeExtraConfig datanodeExtraConfig )
datanodeSpecificExtraPgHba=( none none )
Could you show us your configuration?
What are your max_connections and max_pool_size? What did the initdb show for your kernel? My guess is that when you add the datanode2 (dn2) you don't have enough connections.
You have:
cluster includes 5 nodes, 1 GTM, 2 Coordinator and 2 Datanodes. The
following are the details of nodes.
Postgres-xl specific:
max_pool_size=300
max_coordinators=2
max_datanodes=2
In case of Coordinator (minimal settings):
max_connections=100# number of connections accepted from application(s)
max_prepared_transactions = 100 # same as number of connections
In case of Datanode (minimal settings):
max_connections=200 # 2 coordinators
max_prepared_transactions=2 #Specify at least total number of Coordinators in the cluster.
Excerpt from the Postgres(-xl) documentation
max_connections (integer)
Determines the maximum number of concurrent connections to the database server. The default is typically 100 connections, but might
be less if your kernel settings will not support it (as determined
during initdb). This parameter can only be set at server start.
When running a standby server, you must set this parameter to the same or higher value than on the master server. Otherwise, queries
will not be allowed in the standby server.
In the case of the Coordinator, this parameter determines how many connections can each Coordinator accept.
In the case of the Datanode, number of connection to each Datanode may become as large as max_connections multiplied by the number of
Coordinators.
max_pool_size (integer)
Specify the maximum connection pool of the Coordinator to Datanodes. Because each transaction can be involved by all the
Datanodes, this parameter should at least be max_connections
multiplied by number of Datanodes.
Edit - for update question configuration
Try this:
Coordinator
max_connections=100
max_pool_size=300
Datanode (you have 2 datanodes defined)
max_connections=200
max_pool_size=500

How to find the optimal value for mongo.options.connectionsPerHost

Currently I am using Grails and I am running several servers connecting to a single mongo server.
options {
autoConnectRetry = true
connectTimeout = 3000
connectionsPerHost = 100
socketTimeout = 60000
threadsAllowedToBlockForConnectionMultiplier = 10
maxAutoConnectRetryTime=5
maxWaitTime=120000
}
Unfortunately, when I run 50 servers, total number of connections goes up by 5k. After a bit of research I found that this was a simple config in the DataSource.groovy
I am sure that my programs do not need 100 mongo connections.
But I am unsure what value should I set this to.
I have 2 doubts.
First, how to determine the optimal value for the connectionsPerHost.
Second, whether all these 100 connections are created once and then pooled?

"Lost connection to MySQL server during query" in Google Cloud SQL

I am having a weird, recurring but not constant, error where I get "2013, 'Lost connection to MySQL server during query'". These are the premises:
a Python app runs around 15-20minutes every hour and then stops (hourly scheduled by cron)
the app is on a GCE n1-highcpu-2 instance, the db is on a D1 with a per package pricing plan and the following mysql flags
max_allowed_packet 1073741824
slow_query_log on
log_output TABLE
log_queries_not_using_indexes on
the database is accessed only by this app and this app only so the usage is the same, around 20 consecutive minutes per hour and then nothing at all for the other 40 minutes
the first query it does is
SELECT users.user_id, users.access_token, users.access_token_secret, users.screen_name, metadata.last_id
FROM users
LEFT OUTER JOIN metadata ON users.user_id = metadata.user_id
WHERE users.enabled = 1
the above query joins two tables that are each around 700 lines longs and do not have indexes
after this query (which takes 0.2 seconds when it runs without problems) the app starts without any issues
Looking at the logs I see that each time this error presents itself the interval between the start of the query and the error is 15 minutes.
I've also enabled the slow query log and those query are registered like this:
start_time: 2014-10-27 13:19:04
query_time: 00:00:00
lock_time: 00:00:00
rows_sent: 760
rows_examined: 1514
db: foobar
last_insert_id: 0
insert_id: 0
server_id: 1234567
sql_text: ...
Any ideas?
If your connection is idle for the 15 minute gap the you are probably seeing GCE disconnect your idle TCP connection, as described at https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet. Try the workaround that page suggests:
sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5
(You may need to put this configuration into /etc/sysctl.conf to make it permanent)