Maximising concurrent request handling with PostgreSQL / Npgsql client - postgresql

I have a db and client app that does reads and writes, I need to handle a lot of concurrent reads but be sure that writes get priority, while also respecting my db’s connection limit.
Long version:
I have a single instance pgSQL database which allows 100 connections.
My .net microservice uses Npgsql to connect to the db. It has to do read queries that can take 20-2000ms and writes that can take about 500-2000ms. Right now there are 2 instances of the app, connecting with the same user credentials. I am trusting Npgsql to manage my connection pooling, and am preparing my read queries as there are basically just 2 or 3 variants with different parameter values.
As user requests increased, I started having problems with the database’s connection limit. Errors like ‘Too many connections’ from the db.
To deal with this I introduced a simple gate system in my repo class:
private static readonly SemaphoreSlim _writeGate = new(20, 20);
private static readonly SemaphoreSlim _readGate = new(25, 25);
public async Task<IEnumerable<SomeDataItem>> ReadData(string query, CancellationToken ct)
{
await _readGate.WaitAsync(ct);
// try to get data, finally release the gate
_readGate.Release();
}
public async Task WriteData(IEnumerable<SomeDataItem>, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
// try to write data, finally release the gate
_writeGate.Release();
}
I chose to have separate gates for read and write because I wanted to be confident that reads would not get completely blocked by concurrent writes.
The limits are hardcoded as above, a total of limit of 45 on each of the 2 app instances, connecting to 1 db server instance.
It is more important that attempts to write data do not fail than attempts to read. I have some further safety here with a Polly retry pattern.
This was alright for a while, but as the concurrent read requests increase, I see that the response times start to degrade, as a backlog of read requests begins to accumulate.
So, for this question, assume my sql queries and db schema are optimized to the max, what can I do to improve my throughput?
I know that there are times when my _readGate is maxed out, but there is free capacity in the _writeGate. However I don’t dare reduce the hardcoded limits because at other times I need to support concurrent writes. So I need some kind of QoS solution that can allow more concurrent reads when possible, but will give priority to writes when needed.
Queue management is pretty complicated to me but is also quite well known to many, so is there a good nuget package that can help me out? (I’m not even sure what to google)
Is there a simple change to my code to improve on what I have above?
Would it help to have different conn strings / users for reads vs writes?
Anything else I can do with npgsql / connection string that can improve things?
I think that postgresql recommends limiting connections to 100, there's a SO thread on this here: How to increase the max connections in postgres?
There's always a limit to how many simultaneous queries that you can run before the perf would stop improving and eventually drop off.
However I can see in my azure telemetry that my db server is not coming close to fully using cpu, ram or disk IO (cpu doesn't exceed 70% and is often less, memory the same, and IOPS under 30% of its capacity) so I believe there is more to be squeezed out somewhere :)
Maybe there are other places to investigate, but for the sake of this question I'd just like to focus on how to better manage connections.

First, if you're getting "Too many connections" on the PostgreSQL side, that means that the total number of physical connections being opened by Npgsql exceeds the max_connection setting in PG. You need to make sure that the aggregate total of Npgsql's Max Pool Size across all app instances doesn't exceed that, so if your max_connection is 100 and you have two Npgsql instances, each needs to run with Max Pool Size=50.
Second, you can indeed have different connection pools for reads vs. writes, by having different connection strings (a good trick for that is to set the Application Name to different values). However, you may want to set up one or more read replicas (primary/secondary setup); this would allow all read workload to be directed to the read replica(s), while keeping the primary for write operation only. This is a good load balancing technique, and Npgsql 6.0 has introduced great support for it (https://www.npgsql.org/doc/failover-and-load-balancing.html).
Apart from that, you can definitely experiment with increasing max_connection on the PG side - and accordingly Max Pool Size on the clients' side - and load-test what this do to resource utilization.

Related

High CPU usage on Cloud SQL causing timeouts

We have a postres database that has billions of records in it.
We have one client that uses our older API to query the database to fetch thousands of records once a day.
I would say close to the top end of the thousands.
The API is currently on a compute engine behind a load balancer and during the allotted time I spin up 6 instances of this to attempt to help handle the load.
What I have found is that the CPU usage on cloud SQL is maxing out at 100% and most of the other stats are fine, it's just the CPU.
This basically renders our API useless as we can't accept connections and it just shits its self.
What can we do to help this?
Here is the CPU utilisation chart
And the connections
Read/Writes
Memory Usage
You can see in most of the other charts the readings are well within normal for what we expect.
I don't really want to have to beef up the CPU usage if it isn't really the actual underlining problem.
A further thing to note is we have developed a new endpoint for this client specifically to use, they have not got that in place yet, and there is no guarantee that it will reduce the db load.
High CPU usage can most definitely cause dropped or ignored connections. The database engine and underlying OS are fighting for resources and aren't able to respond to the connection in time.
While you can increase CPU usage, it looks like the CPU usage you have it (usually) enough, except during parts where the CPU is at 100%. I'd suggest instead finding out why the query is eating so much CPU usage and optimizing it.
You might be interested in something like Cloud SQL Insights to help debug the query.

What is the ideal number of max connections for a postgres database?

I'm currently using the default connection pool in sequelize, which is as follows:
const defaultPoolingConfig = {
max: 5,
min: 0,
idle: 10000,
acquire: 10000,
evict: 10000,
handleDisconnects: true
};
Of late, I'm getting these errors ResourceRequest timed out which are due to the above DB configuration. According to some answers the max pool should be set to 5, but those who have faced the above, Resource timeout, error have suggested to increase the pool size to 30, along with increasing the acquire time.
I need to know what must be the optimum value of max pool size for a web-app.
Edit: 1.Lets say I have 200 concurrent users, and I have 20 concurrent queries. Then what should be the values?
2.My database is provided by GCP, with the following configuration:
vCPUs
1
Memory
3.75 GB
SSD storage
10 GB
I'm adding some graphs for CPU utilization, Read / write operations per second and transactions per second.
My workload resources are as follows:
resources:
limits:
cpu: 500m
memory: 600Mi
requests:
cpu: 200m
memory: 500Mi
The number of concurrent connections should be large enough for the number of concurrent running queries or transactions you may have.
If you have a lower limit, then new queries/transactions will have to wait for an available connection.
You may want to monitor currently running queries (see pg_stat_activity for instance) to detect such issues.
However, your database server must be able to handle the number of connections. If you are using a server provided by a third party, it may have set limits. If you are using your own server, then it needs to be configured properly.
Note that to handle more connections, your database server will need more processes and more RAM. Also, if they are long running queries (as opposed to transactions), then you are most probably resource-constrained on the server (often I/O-bound), and adding more queries running at the same time usually won't help with overall performance. You may want to look at configuration of your DB server (buffers etc.), and of course, if you haven't already done so, optimise your queries (make sure they all use indexes). The other pg_stat_* views and EXPLAIN are your friends here.
If you have long-running transactions with lots of idle time, then more concurrent connections may help, though you may have to wonder why you have such long-running transactions.
To summarise, your next steps should be to:
Check the immediate state of your database server using pg_stat_activity and friends.
If you don't already have that, set up monitoring of I/O, CPU, memory, swap, postgresql statistics over time. This will give you a clearer picture of what is going on on your server. If you don't have that, you're just running blind.
If you have long-running transactions, check that you always correctly release transactions/connections, including when errors occur. This is a pretty common issue with node.js-based web servers. Make sure you use try .. catch blocks wherever needed.
If there are any long-running queries, check that they are properly optimised (using indexes). If not, do your utmost to optimise them. This will be the single most useful step you can take if that's were the issue is.
If they are properly optimised and you have enough spare resources (RAM, I/O...), then you can consider raising the number of connections. Otherwise it's just pointless.
Edit
Since you are not operating the database yourself, you won't necessarily have all the visibility you could have on resource usage.
However, you can still:
Check pg_stat_activity. This alone will tell you a lot of things.
Check for connections/transactions that are kept around when they shouldn't
Check queries are properly optimised
GCP has a default maximum concurrent connections limit set to 100 for instances with 3.75 GiB of RAM. So you could indeed increase the size of your pool. But if any of the above issues are present, you are just delaying or moving the issue a bit further, so start by checking those and fixing them if relevant.

How do I manage connection pooling to PostgreSQL from sidekiq?

The problem I have a rails application that runs a few hundred sidekiq background processes. They all connect to a PostgreSQL database which is not exactly happy about providing 250 connections - it can, but if all sidekiq processes accidentally send queries to the db, it crumbles.
Option 1 I have been thinking about adding pgBouncer in front of the db, however I cannot currently use it's transactional mode, since I'm highly dependent upon setting the search_path at the beginning of each job processing for determining which "country" (PostgreSQL schema) to work on (apartment-gem). In this case, I would have to use the session based connection pooling mode. This however would, as far as I know, require me to disconnect the connections after each job processing, to release the connections back into the pool, and that would be really costly performance wise wouldn't it? Am I missing out on something?
Option 2 use application layer based connection pooling is of cause also an option, however I'm not really sure how I would be able to do that for PostgreSQL with sidekiq?
Option 3 something I have not thought of?
Option 1: You're correct, sessions would require you to drop and reconnect and that adds overhead. How costly would be dependent on access pattern ie what fraction of the connection/tcp handshake etc is of the total work done and what sort of latency you need. Definitely worth benchmarking but if the connections are short lived then the overhead will be really noticeable.
Option 2/3: You could rate limit or throttle your sidekiq jobs. There are a few projects here tackling this...
Queue limits
Sidekiq Limit Fetch: Restrict number of workers which are able to run specified queues simultaneously. You can pause queues and resize queue distribution dynamically. Also tracks number of active workers per queue. Supports global mode (multiple sidekiq processes). There is an additional blocking queue mode.
Sidekiq Throttler: Sidekiq::Throttler is a middleware for Sidekiq that adds the ability to rate limit job execution on a per-worker basis.
sidekiq-rate-limiter: Redis backed, per worker rate limits for job processing.
Sidekiq::Throttled: Concurrency and threshold throttling.
I got the above from here
https://github.com/mperham/sidekiq/wiki/Related-Projects
If your application must have a connection per process and you're unable to break it up where more threads can use a connection then it's pgBouncer or Application based connection pooling. Connection pooling is in effect either going to throttle or limit your app in some way in order to save the DB.
Sidekiq should only require one connection for each worker thread. If you are setting your concurrency to a reasonable value, say 10-25, I don't think you should be using 250 simultaneous database connections. How many worker processes are you running, and what is their concurrency?
Also, you can see on that page that even if you have a high concurrency setting, you can still create a connection pool shared by the threads within that process.

Why does Perl makes the system very slow when I made more than 4,000 database connections?

I was writing a code to find the speed of my database using a Perl script.
My intention was to make a 4,000 database connection after each fork (which would act as a 4,000 different clients) and sleep, and I issue the update command when I get the signal
but the system itself becomes very slow and almost hangs for making the connections itself and even I couldn't send the signal using my terminal.
I am using DBI module, I have 4GB RAM in my system where Postgres 8.3 is running in a different machine.
I'm not entirely clear on whether you're saying you wanted to a) Open 4,000 connections, fork, open 4,000 more connections, etc. or b) Fork 4,000 times and open one connection from each process, but 4,000 database connections or 4,000 processes is some pretty serious resource consumption either way. I'm not at all surprised that it's slowing your system to a crawl - I would expect that to be the end result regardless of the language used.
What are you actually attempting to achieve by creating all of these processes and/or connections? There's probably a better way to do it that won't be quite so resource-intensive.
I've seen pgpool in use on production systems where the number of postgres connections could not be limited to something reasonable. You may wish to look into using this yourself to mitigate against poor application design by your developers.
Essentially, pgpool acts as a proxy to postgres. It multiplexes queries on lots of connections to a much smaller (and manageable) number to the back-end database.
That is relativity speaking a lot of connections to have at once, but not unheard of by any means. How much memory do you have on the database server? Each connection takes resources, if you don't have a database server setup to handle that volume of connections it will be slow no matter what language you use to connect.
A simple analogy would be if you had a Toyota Prius (old days I would had said Ford Pinto) pulling a semi trailer with 80,000 lbs (typical legal weight in a lot of the states) of weight in it. It would burn that little Prius up in a heartbeat like you are seeing. To do it right you need to buy your self a big rig and hook it to that trailer to move that amount of weight.
Ignoring the wisdom of doing 4000 connection forks, you should work through your performance issues with something akin to Devel::NYTProf.
I would alternatively setup persistent workers in gearman and do my gearman client requests. Persistence and your scheduled forks on demand.

PostgreSQL consuming large amount of memory for persistent connection

I have a C++ application which is making use of PostgreSQL 8.3 on Windows. We use the libpq interface.
We have a multi-threaded app where each thread opens a connection and keeps using without PQFinish it.
We notice that for each query (especially the SELECT statements) postgres.exe memory consumption would go up. It goes up as high as 1.3 GB. Eventually, postgres.exe crashes and forces our program to create a new connection.
Has anyone experienced this problem before?
EDIT: shared_buffer is currently set to be 128MB in our conf. file.
EDIT2: a workaround that we have in place right now is to call PQfinish for every transaction. But then, this slows down our processing a bit since establishing a connection every time is quite slow.
In PostgreSQL, each connection has a dedicated backend. This backend not only holds connection and session state, but is also an execution engine. Backends aren't particularly cheap to leave lying around, and they cost both memory and synchronization overhead even when idle.
There's an optimum number of actively working backends for any given Pg server on any given workload, where adding more working backends slows things down rather than speeding it up. You want to find that point, and limit the number of backends to around that level. Unfortunately there's no magic recipe for this, it mostly involves benchmarking - on your hardware and with your workload.
If you need more connections than that, you should use a proxy or pooling system that allows you to separate "connection state" from "execution engine". Two popular choices are PgBouncer and PgPool-II . You can maintain light-weight connections from your app to the proxy/pooler, and let it schedule the workload to keep the database server working at its optimum load. If too many queries come in, some wait before being executed instead of competing for resources and slowing down all queries on the server.
See the postgresql wiki.
Note that if your workload is read-mostly, and especially if it has items that don't change often for which you can determine a reliable cache invalidation scheme, you can also potentially use memcached or Redis to reduce your database workload. This requires application changes. PostgreSQL's LISTEN and NOTIFY will help you do sane cache invalidation.
Many database engines have some separation of execution engine and connection state built in to the core database engine's design. Sybase ASE certainly does, and I think Oracle does too, but I'm not too sure about the latter. Unfortunately, because of PostgreSQL's one-process-per-connection model it's not easy for it to pass work around between backends, making it harder for PostgreSQL to do this natively, so most people use a proxy or pool.
I strongly recommend that you read PostgreSQL High Performance. I don't have any relationship/affiliation with Greg Smith or the publisher*, I just think it's great and will be very useful if you're concerned about your DB's performance.
* ... well, I didn't when I wrote this. I work for the same company now.
The memory usage is not necessarily a problem. PostgreSQL uses shared memory for some caching, and this memory does not count towards the size of the process memory usage until it's actually used. The more you use the process, the larger parts of the shared buffers will be active in it's address space.
If you have a large value for shared_buffers, this will happen. If you have it too large, the process can run out of address space and crash, yes.
The problem is probably that you don't close the transaction,
In PostgreSQL even if you do only selects without DML it runs in transaction which need to be rollback.
By adding rollback at the end of the transaction will reduce your memory problem