We have a multi tenant set up with one database for each tenant in a single server. Is it possible to main a common pool for all databases with pgbouncer? Our number databases in one server can range in a few hundreds. While I can have a large number of connections from application to pgbouncer, I am limited by the number of connection I can have with postgres server. What is the best approach in this scenario?
Related
I have an AWS Serverless V2 database setup (postgresql) that is being accessed from a compute cluster. The cluster launches a large number of jobs (>1000) and each job independently puts/pulls some data from the database. The Serverless cluster is setup to autoscale from 2 to 32 units as needed.
The code being run by each cluster job is using SQLAlchemy (either the ORM or the core). I am setting up each database connection with a null pool and pessimistic disconnect handling (i.e., pool_pre_ping=True). From my reading of the docs this should be handling disconnects due to being idle mid-connection.
Code is also written to access the DB, get the results, close the connection (to avoid idle connections), and then reopen the connection after processing (5-30 minutes). This is working well because once processing is completed, the new connections are staggered and the DB has scaled up.
My logs are showing the standard, all connections are taken error: psycopg2.OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser and rds_superuser connections until the DB scales the available units high enough.
Questions:
Should I be configuring the SQLAlchemy connection differently? It feels like an anti-pattern to put in a custom retry to grab a connection while waiting for the DB to scale the number of available units as this type of capability seems to be built into SQLAlchemy usually.
Should I be using an RDS Proxy in front of the database? This also seems like an anti-pattern, adding a proxy in front of an autoscaling DB.
PG version is 10.
Given a PostgreSQL database that is reasonably configured for its intended load what factors would contribute to selecting an external/middleware connection pool (i.e. pgBouncer, pgPool) vs a client-side connection pool (HikariCP, c3p0). Lastly, in what instances are you looking to apply both client-side and external connection pooling?
From my experience and understanding, the disadvantages of an external pool are:
additional failure point (including from a security standpoint)
additional latency
additional complexity in deployment
security complications w/ user credentials
In researching the question, I have come across instances where both client-side and external pooling are used. What is the motivation for such a deployment? In my mind that is compounding the majority of disadvantages for a gain that I appear to be missing.
Usually, a connection pool on the application side is a good thing for the reasons you detail. An external connection pool only makes sense if
your application server does not have a connection pool
you have several (many) instances of the application server, so that you cannot effectively limit the number of database connections with a connection pool in the application server
I'm looking for a way to create a connection pool for many DBs on the same DB server (PostgreSQL Aurora).
This means that I need the ability of changing the target DB of a connection at run time.
Currently I'm using HikariCP for connection pooling, in a stack of Spring Boot and JHispter.
Background:
we need to deploy a multi-tenancy micro-service system with a single DB server (to be specific, a single AWS Aurora PostgreSQL instance)
our solution of multi-tenancy is that each tenant has a DB, in that DB we have many schema for each service. All the DBs are in the same AWS Aurora instance.
Our problem:
with this deployment, we have a connection pool for each (tenant x micro-service instance).
This leads to a huge number of connections.
Ie: with the pool size of 50 connections/pool. We need: 500 tenants x 20 micro-service instances x 50 connections/pool = 500000 connections.
The maximum connections allowed on any Aurora DB is 16000, and actually by default the "max_connections" parameter is typically set to something lower.
So now I'm looking for a way to make our pooling scope larger, so that many tenants can share the same pool. Since we use only 1 Aurora server instance, I think it's possible to create a connection pool that can be shared between many tenants.
Is there any way to have a connection pool that can switch the DB at run time?
Unless Aurora has done some customization on this, you cannot change the database of a connection once it is established in PostgreSQL. You can still use a pooler, but it will effectively be a separate pool for each database. This is pretty fundamental, there is nothing you can do about it.
We are using amazon r3.8xlarge postgres RDS for our production server.I checked the max connections limit of the RDS, it happens to be 8192 max connections limit.
I have a service which is deployed in ECS and each ECS tasks can take one database connection.The tasks go up to 2000 during peak load.That means we will have 2000 concurrent connections to the database.
I want to check whether it is ok to have 2000 concurrent connections to database.secondly, Will it impact the performance of amazon postgres RDS.
Having 2000 connection at time should not cause any performance issue, since AWS manages the performance part. There are many DB load testing tools available, if you want to be at most sure about this.
I have 3 postgresql database (one master and two slave) with a pgpool, each database can handle 200 connections, and I want to be able to get 600 active connection on the pgpool.
My problem is that if I set pgpool with 600 child process, it can open the 600 connection on only one database (the master for example if all connection make a write query), but with 200 child process I only use +- 70 connection on each database.
So is there a way to configure pgpool to have a load balancing that scale with the number of database ?
Thanks.
Having 600 connections available in each db should not be an ideal solution. I would really look into my application before setting such a high connections value.
Load balancing scalability of pgpool can be increased by setting equal backend_weight parameter. So that no of sql queries will equally get distributed among postgresql nodes.
Also pgpool manages database connection pool using num_init_children and max_pool parameter.
The num_init_children parameter is used to span pgpool process that will connect to each PostgreSQL backends.
Also num_init_children parameter value is the allowed number of concurrent clients to connect with pgpool.
pgpool roughly tries to make max_pool*num_init_children no of connections to each postgresql backend.