Recommended connection pool size for HikariCP - postgresql

As written in HikariCP docs the formula for counting connection pool size is connections = ((core_count * 2) + effective_spindle_count). But which core count is this: my app server or database server?
For example: my is app running on 2 CPUs, but database is running on 16 CPUs.

This is Kevin's formula for connection pool size, where cores and spindles (you can tell is is an old formula) are the database server's.
This assumes that the connections are kept fairly busy. If you have transactions with longer idle times, you might need to make the pool bigger.
In the end, only trial and error can find the ideal pool size.

The quote is from PostgreSQL wiki which is related to database cores/server
database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.
Notice that this formula may be outdated (comment by #mustaccio)
That wiki page was last updated nearly 5 years ago, and the advice in question is even older. I/O queue depth might be more relevant today than the number of spindles, even if the latter are actually present

Related

knexfile settings when using PgBouncer

We have a setup where multiple Node processes write into the same database (different tables), and as a result, when using Knex, we end up with more connections to the database than desirable. So, I was thinking of using PgBouncer as a middleware for the Knex processes to connect to, but I'm unsure of how Knex's attempts at connection pooling will work with PgBouncer, which will setup its own pool of connections.
Please assume the following:
A 2vCPU database server
10+ Node processes interacting with the database
PgBouncer running with a pool size of 5
Questions:
If I set min/max size as 1/5 in each Knex setup, will I run out of connections or will PgBouncer somehow be able to "fool" each Knex setup into believing that it has its own pool?
It doesn't feel like I can use a Knex pool in this scenario. Even using min/max pool sizes as 1/1 will leave me out of options if the first five Knex steups I launch claim a connection each.
Is there a way to make Knex drop pooling and open/close connections as needed? This is the ideal setup for me because now PgBouncer won't actually be opening/closing connections but returning them to the pool (unless I'm mistaken about this?).
What strategy should I use? What should my knexfile look like? And would I need to code differently for this? Any help or ideas are welcome!
While it would be ridiculous to allow 32000 connections, it is also ridiculous to allow only 5. I think the lesson from your link should be not that there is a precisely defined magic number of connections, but that you need to look at the waitevents of your performing database, or just do experiments, to see what is going on and whether you have too many connections.
While repeatedly connecting to pgbouncer (which reuses its internal connection to PostgreSQL) might be less expensive than repeatedly connecting all the way through to PostgreSQL, it will still be far more expensive than just re-using an existing connection from knex's internal connection pool. If your connection load is high enough to matter, then bypassing the internal connection pool to just use pgbouncer would be a mistake. Most likely using pgbouncer at all is a mistake, as it just introduces yet another moving piece for no good reason.
Using knex pooler with min:1 and max:5 with 10 different knex app servers and a limit of 5 connections in pgbouncer would mean that only 5 of your app servers could have a connection. The rest would be forced to wait, but it isn't clear what they would be waiting for. Presumably they would wait forever, or until they caught a timeout error, or until one of other app servers exited or shutdown its pool. Pgbouncer would fool them all right, but not in a helpful way. It might make more sense to use this a min:0 (which is now the recommended setting, but still not the default), as that way an app server would at least release its final connection after idleTimeoutMillis, allowing another app to use it.
Using min:1 max:1 could be useful if pgbouncer were not used or used with a large enough pool size, but it could also break entirely. For example, if an app needs at least 2 simultaneous connections to work correctly. That would probably be a poorly written app, but poorly written apps are the rule, not the exception.

Connection Pool Capabilities in DigitalOcean PostgreSQL Managed Databases

I have connection pools setup for my system to handle concurrent connections for my managed database clusters in DigitalOcean.
Overall, each client I have, has their own DB, then I create a pool for that connection to avoid the error:
FATAL: remaining connection slots are reserved for non-replication superuser connections
Yesterday I ran into connection issues with a default database that my system also uses, I hadn't thought the connection pooling was needed for whatever dumb reason or another. No worries, I started getting flooded with error emails and then fixed the system to use the correct pooling mechanism.
This is where my question comes in, with the pooling on DigitalOcean they give you a specific "size" depending on your subscription, my subscription has an available "size" for the clusters of 97. As my clients grow I will be creating new pools and databases for them, so eventually I will run out of slots to assign a pool...what does this "size" dictate?
For example 1 client I have has an allotted size of 10 to their connection pool. Speaking to support:
The connection pool with a size of 1 will only allow 1 connection at a time. As for how you can estimate the number of simultaneous users, this is something you'll need to look over as your user and application grow. We don't have a way to give you that estimate from our back end.
So with that client that has a size of 10 alloted to their pool, they have 88 staff users that use the system simultaneously throughout the day, then they have about 4,000 users that they manage that can all sign in theoretically at once.
This is a lot more than 10 connections, and I get no errors on connection size at least that I've seen so far.
Given that I have a limited amount, how do I determine the appropriate size to use, does anybody have experience with this in production?
For example, with the connections listed above, is 10 too much, too little, just right?
Update 2/14/23
I have tested the capabilities bit because I was curious and can't get any semi-logical answer. When I use 1 connection pool for my 4,000 user client (although all users would not hit their DB/pool at the same time), I get connection errors (specifically when running background tasks from django-celery and Celery in the middle of the night).
Here are those errors, overall just connection already closed from here:
File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 269, in create_cursor cursor = self.connection.cursor()
This issue happened concurrently on 2 nights, but never during the day during normal user activity.
Once I upped the connection pool for said 4,000 user client to 2 instead of 1 the connection already closed error never occurred again.

Maximising concurrent request handling with PostgreSQL / Npgsql client

I have a db and client app that does reads and writes, I need to handle a lot of concurrent reads but be sure that writes get priority, while also respecting my db’s connection limit.
Long version:
I have a single instance pgSQL database which allows 100 connections.
My .net microservice uses Npgsql to connect to the db. It has to do read queries that can take 20-2000ms and writes that can take about 500-2000ms. Right now there are 2 instances of the app, connecting with the same user credentials. I am trusting Npgsql to manage my connection pooling, and am preparing my read queries as there are basically just 2 or 3 variants with different parameter values.
As user requests increased, I started having problems with the database’s connection limit. Errors like ‘Too many connections’ from the db.
To deal with this I introduced a simple gate system in my repo class:
private static readonly SemaphoreSlim _writeGate = new(20, 20);
private static readonly SemaphoreSlim _readGate = new(25, 25);
public async Task<IEnumerable<SomeDataItem>> ReadData(string query, CancellationToken ct)
{
await _readGate.WaitAsync(ct);
// try to get data, finally release the gate
_readGate.Release();
}
public async Task WriteData(IEnumerable<SomeDataItem>, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
// try to write data, finally release the gate
_writeGate.Release();
}
I chose to have separate gates for read and write because I wanted to be confident that reads would not get completely blocked by concurrent writes.
The limits are hardcoded as above, a total of limit of 45 on each of the 2 app instances, connecting to 1 db server instance.
It is more important that attempts to write data do not fail than attempts to read. I have some further safety here with a Polly retry pattern.
This was alright for a while, but as the concurrent read requests increase, I see that the response times start to degrade, as a backlog of read requests begins to accumulate.
So, for this question, assume my sql queries and db schema are optimized to the max, what can I do to improve my throughput?
I know that there are times when my _readGate is maxed out, but there is free capacity in the _writeGate. However I don’t dare reduce the hardcoded limits because at other times I need to support concurrent writes. So I need some kind of QoS solution that can allow more concurrent reads when possible, but will give priority to writes when needed.
Queue management is pretty complicated to me but is also quite well known to many, so is there a good nuget package that can help me out? (I’m not even sure what to google)
Is there a simple change to my code to improve on what I have above?
Would it help to have different conn strings / users for reads vs writes?
Anything else I can do with npgsql / connection string that can improve things?
I think that postgresql recommends limiting connections to 100, there's a SO thread on this here: How to increase the max connections in postgres?
There's always a limit to how many simultaneous queries that you can run before the perf would stop improving and eventually drop off.
However I can see in my azure telemetry that my db server is not coming close to fully using cpu, ram or disk IO (cpu doesn't exceed 70% and is often less, memory the same, and IOPS under 30% of its capacity) so I believe there is more to be squeezed out somewhere :)
Maybe there are other places to investigate, but for the sake of this question I'd just like to focus on how to better manage connections.
First, if you're getting "Too many connections" on the PostgreSQL side, that means that the total number of physical connections being opened by Npgsql exceeds the max_connection setting in PG. You need to make sure that the aggregate total of Npgsql's Max Pool Size across all app instances doesn't exceed that, so if your max_connection is 100 and you have two Npgsql instances, each needs to run with Max Pool Size=50.
Second, you can indeed have different connection pools for reads vs. writes, by having different connection strings (a good trick for that is to set the Application Name to different values). However, you may want to set up one or more read replicas (primary/secondary setup); this would allow all read workload to be directed to the read replica(s), while keeping the primary for write operation only. This is a good load balancing technique, and Npgsql 6.0 has introduced great support for it (https://www.npgsql.org/doc/failover-and-load-balancing.html).
Apart from that, you can definitely experiment with increasing max_connection on the PG side - and accordingly Max Pool Size on the clients' side - and load-test what this do to resource utilization.

HikariCP connection pool - 'active' - how to debug?

I am building an app using Spring-Boot/Hibernate with Postgres as the database. I am using Spring 2.0, so Hikari is the default connection pool provider.
Currently, I am trying to load-test the application with a REST end-point that does an 'update-if-exists and insert if new' to an entity in the database. Its a fairly small entity with 'BIGSERIAL' primary key and no constraints on any other field.
The default connection pool size is 10 and I haven't really tweaked any other parameters - either of the HikariCP or for Postgres.
The point at which I am stuck at this moment is to debug connections in 'active' state and what they are doing or why they stuck currently.
When I run '10 simultaneous users', it basically translates into 2 or 3 times that many queries and thus, when I turn on the HikariCP debug logs, it hangs at something like this -
(total=10, active=10, idle=0, waiting=2) and the 'active' connections do not really release the connections, which is what I am trying to find out because the queries are fairly simple and the table itself is just 4 fields (including the primary key).
The best practices from HikariCP folks as well generally is that increasing the connection pool is not the right first step towards scaling.
If I do increase the connection pool size to 20, things start working for 10 simultaneous/concurrent users but then again, its not the root cause/solution for the problem I believe.
Is there any way I can log either Hibernate or Postgres messages that might help in knowing what these 'active' connections are waiting on and why the connection doesn't get released even after I increase the wait-time to a long time?
If it is a connection-leak ( as is reported when the leak-detection-threshold is reduced to a lower value (e.g. 30 seconds) ), then how can I tell if Hibernate is responsible for this connection leak or if it is something else?
If it is a lock/wait at the database level, how can I get the root of this?
UPDATE
After help from #brettw, I took a thread-dump when the connections were exhausted and it pointed in the direction of a connection-leak. The threads on HikariCP issues board - https://github.com/brettwooldridge/HikariCP/issues/1030#issuecomment-347632771 - which points to the Hibernate not closing connections which then pointed me to https://jira.spring.io/browse/SPR-14548, which talks about setting Hibernate's connection closing mode since the default mode holds the connection for too long. After setting spring.jpa.properties.hibernate.connection.handling_mode=DELAYED_ACQUISITION_AND_RELEASE_AFTER_TRANSACTION, the connection pool worked perfectly.
Also, the point made here - https://github.com/brettwooldridge/HikariCP/issues/612#issuecomment-209839908 is right - a connection leak should not be covered up by the pool.
It sounds like you could be hitting a true deadlock in the database. There should be a way to query PostgreSQL for current active queries, and current lock states. You'll have to google it.
Also, I would try a simple thread dump to see where all the threads are blocked. It could be a code-level synchronization deadlock.
If all of the threads are blocked on getConnection(), it is a leak.
If all of the threads are down in the driver, according to the stacktrace for each thread, it is a database deadlock.
If all of the threads are blocked waiting for a lock in your application code, then you have a synchronization deadlock -- likely two locks with inverted acquisition order in different parts of the code.
The HikariCP leakDetectionThreshold could be useful, but it will only show where the connection was acquired, not where the thread is currently stuck. Still, it could provide a clue.

Why does Perl makes the system very slow when I made more than 4,000 database connections?

I was writing a code to find the speed of my database using a Perl script.
My intention was to make a 4,000 database connection after each fork (which would act as a 4,000 different clients) and sleep, and I issue the update command when I get the signal
but the system itself becomes very slow and almost hangs for making the connections itself and even I couldn't send the signal using my terminal.
I am using DBI module, I have 4GB RAM in my system where Postgres 8.3 is running in a different machine.
I'm not entirely clear on whether you're saying you wanted to a) Open 4,000 connections, fork, open 4,000 more connections, etc. or b) Fork 4,000 times and open one connection from each process, but 4,000 database connections or 4,000 processes is some pretty serious resource consumption either way. I'm not at all surprised that it's slowing your system to a crawl - I would expect that to be the end result regardless of the language used.
What are you actually attempting to achieve by creating all of these processes and/or connections? There's probably a better way to do it that won't be quite so resource-intensive.
I've seen pgpool in use on production systems where the number of postgres connections could not be limited to something reasonable. You may wish to look into using this yourself to mitigate against poor application design by your developers.
Essentially, pgpool acts as a proxy to postgres. It multiplexes queries on lots of connections to a much smaller (and manageable) number to the back-end database.
That is relativity speaking a lot of connections to have at once, but not unheard of by any means. How much memory do you have on the database server? Each connection takes resources, if you don't have a database server setup to handle that volume of connections it will be slow no matter what language you use to connect.
A simple analogy would be if you had a Toyota Prius (old days I would had said Ford Pinto) pulling a semi trailer with 80,000 lbs (typical legal weight in a lot of the states) of weight in it. It would burn that little Prius up in a heartbeat like you are seeing. To do it right you need to buy your self a big rig and hook it to that trailer to move that amount of weight.
Ignoring the wisdom of doing 4000 connection forks, you should work through your performance issues with something akin to Devel::NYTProf.
I would alternatively setup persistent workers in gearman and do my gearman client requests. Persistence and your scheduled forks on demand.