The problem I have a rails application that runs a few hundred sidekiq background processes. They all connect to a PostgreSQL database which is not exactly happy about providing 250 connections - it can, but if all sidekiq processes accidentally send queries to the db, it crumbles.
Option 1 I have been thinking about adding pgBouncer in front of the db, however I cannot currently use it's transactional mode, since I'm highly dependent upon setting the search_path at the beginning of each job processing for determining which "country" (PostgreSQL schema) to work on (apartment-gem). In this case, I would have to use the session based connection pooling mode. This however would, as far as I know, require me to disconnect the connections after each job processing, to release the connections back into the pool, and that would be really costly performance wise wouldn't it? Am I missing out on something?
Option 2 use application layer based connection pooling is of cause also an option, however I'm not really sure how I would be able to do that for PostgreSQL with sidekiq?
Option 3 something I have not thought of?
Option 1: You're correct, sessions would require you to drop and reconnect and that adds overhead. How costly would be dependent on access pattern ie what fraction of the connection/tcp handshake etc is of the total work done and what sort of latency you need. Definitely worth benchmarking but if the connections are short lived then the overhead will be really noticeable.
Option 2/3: You could rate limit or throttle your sidekiq jobs. There are a few projects here tackling this...
Queue limits
Sidekiq Limit Fetch: Restrict number of workers which are able to run specified queues simultaneously. You can pause queues and resize queue distribution dynamically. Also tracks number of active workers per queue. Supports global mode (multiple sidekiq processes). There is an additional blocking queue mode.
Sidekiq Throttler: Sidekiq::Throttler is a middleware for Sidekiq that adds the ability to rate limit job execution on a per-worker basis.
sidekiq-rate-limiter: Redis backed, per worker rate limits for job processing.
Sidekiq::Throttled: Concurrency and threshold throttling.
I got the above from here
https://github.com/mperham/sidekiq/wiki/Related-Projects
If your application must have a connection per process and you're unable to break it up where more threads can use a connection then it's pgBouncer or Application based connection pooling. Connection pooling is in effect either going to throttle or limit your app in some way in order to save the DB.
Sidekiq should only require one connection for each worker thread. If you are setting your concurrency to a reasonable value, say 10-25, I don't think you should be using 250 simultaneous database connections. How many worker processes are you running, and what is their concurrency?
Also, you can see on that page that even if you have a high concurrency setting, you can still create a connection pool shared by the threads within that process.
Related
We have a setup where multiple Node processes write into the same database (different tables), and as a result, when using Knex, we end up with more connections to the database than desirable. So, I was thinking of using PgBouncer as a middleware for the Knex processes to connect to, but I'm unsure of how Knex's attempts at connection pooling will work with PgBouncer, which will setup its own pool of connections.
Please assume the following:
A 2vCPU database server
10+ Node processes interacting with the database
PgBouncer running with a pool size of 5
Questions:
If I set min/max size as 1/5 in each Knex setup, will I run out of connections or will PgBouncer somehow be able to "fool" each Knex setup into believing that it has its own pool?
It doesn't feel like I can use a Knex pool in this scenario. Even using min/max pool sizes as 1/1 will leave me out of options if the first five Knex steups I launch claim a connection each.
Is there a way to make Knex drop pooling and open/close connections as needed? This is the ideal setup for me because now PgBouncer won't actually be opening/closing connections but returning them to the pool (unless I'm mistaken about this?).
What strategy should I use? What should my knexfile look like? And would I need to code differently for this? Any help or ideas are welcome!
While it would be ridiculous to allow 32000 connections, it is also ridiculous to allow only 5. I think the lesson from your link should be not that there is a precisely defined magic number of connections, but that you need to look at the waitevents of your performing database, or just do experiments, to see what is going on and whether you have too many connections.
While repeatedly connecting to pgbouncer (which reuses its internal connection to PostgreSQL) might be less expensive than repeatedly connecting all the way through to PostgreSQL, it will still be far more expensive than just re-using an existing connection from knex's internal connection pool. If your connection load is high enough to matter, then bypassing the internal connection pool to just use pgbouncer would be a mistake. Most likely using pgbouncer at all is a mistake, as it just introduces yet another moving piece for no good reason.
Using knex pooler with min:1 and max:5 with 10 different knex app servers and a limit of 5 connections in pgbouncer would mean that only 5 of your app servers could have a connection. The rest would be forced to wait, but it isn't clear what they would be waiting for. Presumably they would wait forever, or until they caught a timeout error, or until one of other app servers exited or shutdown its pool. Pgbouncer would fool them all right, but not in a helpful way. It might make more sense to use this a min:0 (which is now the recommended setting, but still not the default), as that way an app server would at least release its final connection after idleTimeoutMillis, allowing another app to use it.
Using min:1 max:1 could be useful if pgbouncer were not used or used with a large enough pool size, but it could also break entirely. For example, if an app needs at least 2 simultaneous connections to work correctly. That would probably be a poorly written app, but poorly written apps are the rule, not the exception.
I have a db and client app that does reads and writes, I need to handle a lot of concurrent reads but be sure that writes get priority, while also respecting my db’s connection limit.
Long version:
I have a single instance pgSQL database which allows 100 connections.
My .net microservice uses Npgsql to connect to the db. It has to do read queries that can take 20-2000ms and writes that can take about 500-2000ms. Right now there are 2 instances of the app, connecting with the same user credentials. I am trusting Npgsql to manage my connection pooling, and am preparing my read queries as there are basically just 2 or 3 variants with different parameter values.
As user requests increased, I started having problems with the database’s connection limit. Errors like ‘Too many connections’ from the db.
To deal with this I introduced a simple gate system in my repo class:
private static readonly SemaphoreSlim _writeGate = new(20, 20);
private static readonly SemaphoreSlim _readGate = new(25, 25);
public async Task<IEnumerable<SomeDataItem>> ReadData(string query, CancellationToken ct)
{
await _readGate.WaitAsync(ct);
// try to get data, finally release the gate
_readGate.Release();
}
public async Task WriteData(IEnumerable<SomeDataItem>, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
// try to write data, finally release the gate
_writeGate.Release();
}
I chose to have separate gates for read and write because I wanted to be confident that reads would not get completely blocked by concurrent writes.
The limits are hardcoded as above, a total of limit of 45 on each of the 2 app instances, connecting to 1 db server instance.
It is more important that attempts to write data do not fail than attempts to read. I have some further safety here with a Polly retry pattern.
This was alright for a while, but as the concurrent read requests increase, I see that the response times start to degrade, as a backlog of read requests begins to accumulate.
So, for this question, assume my sql queries and db schema are optimized to the max, what can I do to improve my throughput?
I know that there are times when my _readGate is maxed out, but there is free capacity in the _writeGate. However I don’t dare reduce the hardcoded limits because at other times I need to support concurrent writes. So I need some kind of QoS solution that can allow more concurrent reads when possible, but will give priority to writes when needed.
Queue management is pretty complicated to me but is also quite well known to many, so is there a good nuget package that can help me out? (I’m not even sure what to google)
Is there a simple change to my code to improve on what I have above?
Would it help to have different conn strings / users for reads vs writes?
Anything else I can do with npgsql / connection string that can improve things?
I think that postgresql recommends limiting connections to 100, there's a SO thread on this here: How to increase the max connections in postgres?
There's always a limit to how many simultaneous queries that you can run before the perf would stop improving and eventually drop off.
However I can see in my azure telemetry that my db server is not coming close to fully using cpu, ram or disk IO (cpu doesn't exceed 70% and is often less, memory the same, and IOPS under 30% of its capacity) so I believe there is more to be squeezed out somewhere :)
Maybe there are other places to investigate, but for the sake of this question I'd just like to focus on how to better manage connections.
First, if you're getting "Too many connections" on the PostgreSQL side, that means that the total number of physical connections being opened by Npgsql exceeds the max_connection setting in PG. You need to make sure that the aggregate total of Npgsql's Max Pool Size across all app instances doesn't exceed that, so if your max_connection is 100 and you have two Npgsql instances, each needs to run with Max Pool Size=50.
Second, you can indeed have different connection pools for reads vs. writes, by having different connection strings (a good trick for that is to set the Application Name to different values). However, you may want to set up one or more read replicas (primary/secondary setup); this would allow all read workload to be directed to the read replica(s), while keeping the primary for write operation only. This is a good load balancing technique, and Npgsql 6.0 has introduced great support for it (https://www.npgsql.org/doc/failover-and-load-balancing.html).
Apart from that, you can definitely experiment with increasing max_connection on the PG side - and accordingly Max Pool Size on the clients' side - and load-test what this do to resource utilization.
I want to use postgres sequence with cache CREATE SEQUENCE serial CACHE 100.
The goal is to improve performance of 3000 usages per second of SELECT nextval('serial'); by ~500 connection/application threads concurrently.
The issue is that I am doing intensive autoscaling and connections will be disconnected and reconnected occasionally leaving "holes" of unused ids in the sequence each time a connection is disconnected.
Well, the good news might be that I am using a PgBouncer heroku buildpack with transaction pool mode.
My question is: will the transaction pool mode solve the "holes" issues that I described, will it reuse the session in a way that the next application connection will take this session from the pool and continue using the cache of the sequence?
This depends on the setting of server_reset_query. If you have that set to DISCARD ALL, then sequence caches are discarded before a server connected is handed out to a client. But for transaction pooling, the recommended server_reset_query is empty, so you will be able to reuse sequence caches in that case. You can also use a different DISCARD command, depending on your needs.
I'm having trouble finding a good summary of the advantages/disadvantages of using pgbouncer for transaction pooling vs session pooling.
Does it mean that a transaction heavy workload is somehow better load balanced? Is it to prevent as many connections being required to connect from pgbouncer to the database?
Transaction-level pooling will help if you have apps that hold idle sessions. PgBouncer won't need to keep sessions open and idle, it just grabs one when a new transaction is started. Those idle sessions only cost you a pgbouncer connection, not a real idle Pg session with a backend sitting around wasting memory & synchronisation overhead doing nothing.
The main reason you'd want session pooling instead of transaction pooling is if you want to use named prepared statements, advisory locks, listen/notify, or other features that operate on a session level not a transaction level.
I have about 11000 hits a second on 10 servers with php-fpm. I'm migrating to postgres from mysql, so my question is Does it make sense to use pg_*p*connect?
It's better to use a dedicated connection pooler like PgBouncer.
Performance should be comparable to pg_pconnect, but PgBouncer will allow to perform a cleanup after an error in PHP code. pg_pconnect will not automatically clean open transactions, locks, prepared statements etc.
Establishing a connection to a PostgreSQL server is expected to be significantly more expensive than to a MySQL server. This is due to different design choices of these databases in how they handle resource allocation and privilege separation between independent connections.
Therefore, for a website, it totally makes sense to reuse connections to PostgreSQL whenever possible.
The way generally recommended is not to use pg_pconnect but rather an external connection pooler like pgBouncer or pgPoolII which are better suited for this task. When using PHP-FPM however, you already have a middleware that lets you control somehow the number of open connections through the fpm process manager options, so it may be good enough. You may consider setting pm.max_requests to a non-zero value to make sure that connections get cleaned up at a reasonable frequency and avoid keeping a pile of unused connections during off-peak hours.
Well, pg_pconnect will mean you have one connection per PHP backend, so it depends how many backends you have. With a traditional Apache mod-php setup it'd be a non-starter but you might get away with it.
The database server can handle hundreds of idle connections, but almost certainly grind to a halt if they all have queries being issued concurrently. I've seen a rule-of-thumb of no more than two connections per core - that's assuming I/O doesn't limit you first.
The common approach is to run a connection pooler like pgbouncer and have php connect per-request. That reduces your connection overhead while keeping concurrency plausible.