Optimal Database Connection Pool Size

Optimal Database Connection Pool Size - postgresql

I have read that you should keep the number of connections in your database connection pool lower than the number of threads running in the application server and that might use that pool correct?
I have read too that having a high number of connections is not good but I don't really know why? Would it use more memory?
Right now during pick times my server is running out of connections and I don't know it would be good just to increase the number of connections.
Thank you

With a Small Connection pool you have faster access on the connection table but may not have enough connections to satisfy requests.
In the other hand with a Large Connection pool there are more connections to fulfill requests and requests will spend less (or no) time in the queue but access on the connection table is slower.
http://docs.oracle.com/cd/E19316-01/820-4343/abehs/index.html

core_count = 4
connections = ((core_count * 2) + effective_spindle_count) = 9

Related

Connection Pool Capabilities in DigitalOcean PostgreSQL Managed Databases

I have connection pools setup for my system to handle concurrent connections for my managed database clusters in DigitalOcean.
Overall, each client I have, has their own DB, then I create a pool for that connection to avoid the error:
FATAL: remaining connection slots are reserved for non-replication superuser connections
Yesterday I ran into connection issues with a default database that my system also uses, I hadn't thought the connection pooling was needed for whatever dumb reason or another. No worries, I started getting flooded with error emails and then fixed the system to use the correct pooling mechanism.
This is where my question comes in, with the pooling on DigitalOcean they give you a specific "size" depending on your subscription, my subscription has an available "size" for the clusters of 97. As my clients grow I will be creating new pools and databases for them, so eventually I will run out of slots to assign a pool...what does this "size" dictate?
For example 1 client I have has an allotted size of 10 to their connection pool. Speaking to support:
The connection pool with a size of 1 will only allow 1 connection at a time. As for how you can estimate the number of simultaneous users, this is something you'll need to look over as your user and application grow. We don't have a way to give you that estimate from our back end.
So with that client that has a size of 10 alloted to their pool, they have 88 staff users that use the system simultaneously throughout the day, then they have about 4,000 users that they manage that can all sign in theoretically at once.
This is a lot more than 10 connections, and I get no errors on connection size at least that I've seen so far.
Given that I have a limited amount, how do I determine the appropriate size to use, does anybody have experience with this in production?
For example, with the connections listed above, is 10 too much, too little, just right?
Update 2/14/23
I have tested the capabilities bit because I was curious and can't get any semi-logical answer. When I use 1 connection pool for my 4,000 user client (although all users would not hit their DB/pool at the same time), I get connection errors (specifically when running background tasks from django-celery and Celery in the middle of the night).
Here are those errors, overall just connection already closed from here:
File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 269, in create_cursor cursor = self.connection.cursor()
This issue happened concurrently on 2 nights, but never during the day during normal user activity.
Once I upped the connection pool for said 4,000 user client to 2 instead of 1 the connection already closed error never occurred again.

Recommended connection pool size for HikariCP

As written in HikariCP docs the formula for counting connection pool size is connections = ((core_count * 2) + effective_spindle_count). But which core count is this: my app server or database server?
For example: my is app running on 2 CPUs, but database is running on 16 CPUs.

This is Kevin's formula for connection pool size, where cores and spindles (you can tell is is an old formula) are the database server's.
This assumes that the connections are kept fairly busy. If you have transactions with longer idle times, you might need to make the pool bigger.
In the end, only trial and error can find the ideal pool size.

The quote is from PostgreSQL wiki which is related to database cores/server
database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.
Notice that this formula may be outdated (comment by #mustaccio)
That wiki page was last updated nearly 5 years ago, and the advice in question is even older. I/O queue depth might be more relevant today than the number of spindles, even if the latter are actually present

How to know the exactly total of current connections to PostgreSQL?

My Azure PostgreSQL server has total connections is 480.
I want to check the total of current connections is accessing to database by perform below SQL:
select * from pg_stat_activity;
I can see the output list includes all users (superuser,...) and with idle and active status. So is this correct to check total of current connections? Or I should exclude "idle" connections to know the exactly the result?
Thank you so much,

"idle" connection is real connection. Because Postgres has not any internal executor pool (like thread pool of MySQL), any "idle" connection can process any commands. At this moment, the "idle" connection doesn't require too much sources, but when you calculate save memory limits (against using swap), you should to calculate with "idle" connections too - because any connection can be active connection sometimes.
480 connections is usually much - good number is 10-20 x CPU cores for max_connections. If you have too high max_connection, then you have to have low work_mem, what can has negative impact on performance, or your configuration should not be safe against overloading.
share buffers + (max connection * work_mem * 2) + ram for operation system
+ ram for filesystem < RAM

Massive holding open of TCP/IP sockets

I would like to create a server infrastructure that allows 500 clients to connect all at the same time and have an indefinite connection. The plan is to have clients connect to TCP/IP sockets on the server and have them stay connected, with the server sending out data to clients randomly and clients sending data to the server randomly, similar to a small MMOG, but with scarcely any data. I came up with this plan in comparison to TCP polling every 15-30 seconds from each client.
My question is, in leaving these connections open, will this cause a massive sum of server bandwidth usage at idle? Is this the best approach without getting into the guts of TCP?

TCP uses no bandwidth when idle, except maybe a few bytes every so often (default is 2 hours) if "keep-alive" is enabled.
500 connections is nothing, though epoll() is a good choice to reduce system overhead. 5000 connections might start to become an issue.

Bandwidth isn't your major concern, but there's a limit to the number of connections you can have open (though it is quite high).
But if polling every 15 seconds would be fast enough, I'd consider it a waste to keep the connection open.

A good PgPool II configuration

I have been trying to configure PgPool to accept a requests of about 150. Postgres server is configured to accept only 100 connections. Anything beyond 100 need to be pooled by PgPool. I don't seem to get that. I only require PgPool to queue the requests, my current configuration does not do that. From my JMeter test, when I try to get connection beyond 100, postgres gives me an error saying PSQL error: sorry, too many clients.
I only have configured PGPool with the following parameters :
listen_address = 'localhost'
port = 9999
backend_hostname0 = 'localhost'
backend_port0 = 5432
num_init_children = 100
max_pool = 4
child_life_time =120
child_max_connections = 0
connections_life_tome = 120
client_idle_limit = 0
Since I only require PgPool to Queue the extra connections requests, is the above configuration correct?
Please advise on the proper configuration.

The 'child_max_connections' in pgpool is NOT the maximum allowed connections to the DB. It is the number of times a pooled connection can be used before it terminates and restarts. It is there to recycle connection threads and stop memory leaks.
The formula of max_pool x num_init_children describes the maximum number of connections that pgpool will make to Postgresql. Obviously, this needs to be less than the 'max_connections' set in postgresql, otherwise pgpool marks the DB as an unavailable backend. And if you have some DB connections reserved for admin use, you need to reduce the number of pgpool connections further.
So, what I am saying is that the 'max_connections' in the formula is the parameter set in postgresql.conf. Setting 'child_max_connections' to 100 in the comment above just means that the pgpool connection is closed and reopened every 100 times it is used.

The first thing is to figure out what you want as your maximum pool size. PostgreSQL performance (both in terms of throughput and latency) is usually best when the maximum number of active connections is somewhere around ((2 * number-of-cores) + effective-spindle-count). The effective spindle count can be tricky to figure -- if your active data set is fully cached, count it as zero, for example. Don't count any extra threads from hyperthreading as cores for this calculation. Also note that due to network latency issues, you may need a pool slightly larger than the calculated number to keep that number of connections active. You may need to do some benchmarks to find the sweet spot for your hardware and workload.
The setting you need to adjust is child_max_connections, with num_init_children kept less than or equal to that.