I am trying to create a connection pool by using HikariCP for Postgres. The pool should always have one active session/connection and the remaining connections should be on-demand which Hikari will take care of. To do the same, configured Datasource like below
private static HikariDataSource dataSource = null;
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://hostname:port/dbname");
config.setUsername("USERNAME");
config.setPassword("PASSWORD");
config.setMinimumIdle(1);
config.setMaximumPoolSize(5);
config.setIdleTimeout(10000);
dataSource = new HikariDataSource(config);
MinimumIdle(1) always keeps the one active session/open connection to DB unless the program or server closed and the remaining 4 connections will be created on-demand if they do not exist already in the pool.
IdleTimeout(10000) removes any connection object which is idle in the pool for more than 10 seconds and observed that it won't apply to any of the connections set to MinimumIdle(Here it is 1).
Is my understanding correct and serves my requirement? Appreciate your suggestions here.
Related
I am trying to understand the difference between 2 configurable parameters while creating a connection pool using r2dbc-pool.
I was able to configure the connection pool with the help of the below post:
Connection pool size with postgres r2dbc-pool
But wanted to understand the difference while configuring max size and initial size while creating
ConnectionFactory connectionFactory = ConnectionFactories.get(ConnectionFactoryOptions.builder()
.option(DRIVER, "pool")
.option(PROTOCOL, "postgresql")
.option(HOST, host)
.option(USER, user)
.option(PASSWORD, password)
.option(MAX_SIZE, 30)
.option(INITIAL_SIZE, 10)
.option(DATABASE, database)
.build());
ConnectionPoolConfiguration configuration = ConnectionPoolConfiguration.builder(connectionFactory)
.maxIdleTime(Duration.ofMinutes(30))
.initialSize(initialSize)
.maxSize(maxSize)
.initialSize(20)
.maxCreateConnectionTime(Duration.ofSeconds(1))
.build();
There isn't a difference really, these are just different ways to create a connection pool.
The initial size is the number of open connections to the database once the application starts.
You can check that with the query:
select * from pg_stat_activity;
You'll see exactly the number you specified as initial size.
Max size is the maximum nuber of connections that can be open at the same time (you do not control that directly, underlying pool implementation will scale the number of opened connections with higher load)
I have an Azure Durable Function that interacts with a PostgreSQL database, also hosted in Azure.
The PostgreSQL database has a connection limit of 50, and furthermore, my connection string limits the connection pool size to 40, leaving space for super user / admin connections.
Nonetheless, under some loads I get the error
53300: remaining connection slots are reserved for non-replication superuser connections
This documentation from Microsoft seemed relevant, but it doesn't seem like I can make a static client, and, as it mentions,
because you can still run out of connections, you should optimize connections to the database.
I have this method
private IDbConnection GetConnection()
{
return new NpgsqlConnection(Environment.GetEnvironmentVariable("PostgresConnectionString"));
}
and when I want to interact with PostgreSQL I do like this
using (var connection = GetConnection())
{
connection.Open();
return await connection.QuerySingleAsync<int>(settings.Query().Insert, settings);
}
So I am creating (and disposing) lots of NpgsqlConnection objects, but according to this, that should be fine because connection pooling is handled behind the scenes. But there may be something about Azure Functions that invalidates this thinking.
I have noticed that I end up with a lot of idle connections (from pgAdmin):
Based on that I've tried fiddling with Npgsql connection parameters like Connection Idle Lifetime, Timeout, and Pooling, but the problem of too many connections seems to persist to one degree or another. Additionally I've tried limiting the number of concurrent orchestrator and activity functions (see this doc), but that seems to partially defeat the purpose of Azure Functions being scalable. It does help - I get fewer of the too many connections error). Presumably If I keep testing it with lower numbers I may even eliminate it, but again, that seems like it defeats the point, and there may be another solution.
How can I use PostgreSQL with Azure Functions without maxing out connections?
I don't have a good solution, but I think I have the explanation for why this happens.
Why is Azure Function App maxing out connections?
Even though you specify a limit of 40 for the pool size, it is only honored on one instance of the function app. Note that that a function app can scale out based on load. It can process several requests concurrently in the same function app instance, plus it can also create new instances of the app. Concurrent requests in the same instance will honor the pool size setting. But in the case of multiple instances, each instance ends up using a pool size of 40.
Even the concurrency throttles in durable functions don't solve this issue, because they only throttle within a single instance, not across instances.
How can I use PostgreSQL with Azure Functions without maxing out connections?
Unfortunately, function app doesn't provide a native way to do this. Note that the connection pool size is not managed by the function runtime, but by npgsql's library code. This library code running on different instances can't talk to each other.
Note that, this is the classic problem of using shared resources. You have 50 of these resources in this case. The most effective way to support more consumers would be to reduce the time each consumer uses the resource. Reducing the Connection Idle Lifetime substantially is probably the most effective way. Increasing Timeout does help reduce errors (and is a good choice), but it doesn't increase the throughput. It just smooths out the load. Reducing Maximum Pool size is also good.
Think of it in terms of locks on a shared resource. You would want to take the lock for the minimal amount of time. When a connection is opened, it's a lock on one of the 50 total connections. In general, SQL libraries do pooling, and keep the connection open to save the initial setup time that is involved in each new connection. However, if this is limiting the concurrency, then it's best to kill idle connections asap. In a single instance of an app, the library does this automatically when max pool size is reached. But in multiple instances, it can't kill another instance's connections.
One thing to note is that reducing Maximum Pool Size doesn't necessarily limit the concurrency of your app. In most cases, it just decreases the number of idle connections - at the cost of - paying the initial setup time when a new connection will need to be established at a later time.
Update
WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT might be useful. You can set this to 5, and pool size to 8, or similar. I would go this way if reducing Maximum Pool Size and Connection Idle Lifetime is not helping.
This is where Dependency Injection can be really helpful. You can create a singleton client and it will do the job perfectly. If you want to know more about service lifetimes you can read it here in docs
First add this nuget Microsoft.Azure.Functions.Extensions.DependencyInjection
Now add a new class like below and resolve your client.
[assembly: FunctionsStartup(typeof(Kovai.Serverless360.Functions.Startup))]
namespace MyFunction
{
class Startup : FunctionsStartup
{
public override void Configure(IFunctionsHostBuilder builder)
{
ResolveDependencies(builder);
}
}
public void ResolveDependencies(IFunctionsHostBuilder builder)
{
var conStr = Environment.GetEnvironmentVariable("PostgresConnectionString");
builder.Services.AddSingleton((s) =>
{
return new NpgsqlConnection(conStr);
}
}
}
Now you can easily consume it from any of your function
public FunctionA
{
private readonly NpgsqlConnection _connection;
public FunctionA(NpgsqlConnection conn)
{
_connection = conn;
}
public async Task<HttpResponseMessage> Run()
{
//do something with your _connection
}
}
Here's an example of using a static HttpClient, something which you should consider so that you don't need to explicitly manage connections, rather allow your client to do it:
public static class PeriodicHealthCheckFunction
{
private static HttpClient _httpClient = new HttpClient();
[FunctionName("PeriodicHealthCheckFunction")]
public static async Task Run(
[TimerTrigger("0 */5 * * * *")]TimerInfo healthCheckTimer,
ILogger log)
{
string status = await _httpClient.GetStringAsync("https://localhost:5001/healthcheck");
log.LogInformation($"Health check performed at: {DateTime.UtcNow} | Status: {status}");
}
}
I have a play application using Slick that I want to test using Spec2, but I keep getting the error org.postgresql.util.PSQLException: FATAL: sorry, too many clients already. I have tried to shut down the database connection by using
val mockApp = new GuiceApplicationBuilder()
val db = mockApp.injector.instanceOf[DBApi].database("default")
...
override def afterAll = {
db.getConnection().close()
db.shutdown()
}
But the error persists. The Slick configuration is
slick.dbs.default.driver="slick.driver.PostgresDriver$"
slick.dbs.default.db.driver="org.postgresql.Driver"
slick.dbs.default.db.url="jdbc:postgresql://db:5432/hygge_db"
slick.dbs.default.db.user="*****"
slick.dbs.default.db.password="*****"
getConnection of DbApi either gets connection from underlying data-source's (JdbcDataSource I presume) pool or creates a new one. I see no pool specified in your configuration, so I think it always creates a new one for you. So if you didn't close connection inside the test - getConnection won't help - it will just try to create a new one or take random connection from pool (if pooling is enabled).
So the solution is to either configure connection pooling:
When using a connection pool (which is always recommended in
production environments) the minimum size of the connection pool
should also be set to at least the same size. The maximum size of the
connection pool can be set much higher than in a blocking application.
Any connections beyond the size of the thread pool will only be used
when other connections are required to keep a database session open
(e.g. while waiting for the result from an asynchronous computation in
the middle of a transaction) but are not actively doing any work on
the database.
so you can just set maximum available connections number in your config:
connectionPool = 5
Or you can share same connection (you'll probably have to ensure sequentiality then):
object SharedConnectionForAllTests{
val connection = db.getConnection()
def close() = connection.close()
}
It's better to inject it with Spring/Guice of course, so you could conviniently manage connection's lifecycle.
I am using the PlayFrameWork with Slick and using it in a system that is all I/O database heavy. In my application.conf file I have this setting:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 20.0
}
}
}
}
}
This obviously gives me 20 threads per core for the play application and as I understand it Slick creates it's own threadpool, is the NumThreads field in Slick mean that that's the total number of threads or is it (NumThreads x CPU's)? And is there any best practice for best performance? I currently have my settings configured as:
database {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "dbname"
user = "postgres"
password = "password"
}
numThreads = 10
}
numThreads is simple number of thread in Thread pool. Slick use this thread pool for executing querying.
The following config keys are supported for all connection pools, both built-in and third-party:
numThreads (Int, optional, default: 20): The number of concurrent threads in the thread pool for asynchronous execution of
database actions. See the HikariCP wiki for more imformation about
sizing the thread pool correctly. Note that for asynchronous
execution in Slick you should tune the thread pool size (this
parameter) accordingly instead of the maximum connection pool size.
queueSize (Int, optional, default: 1000): The size of the queue for database actions which cannot be executed immediately when all
threads are busy. Beyond this limit new actions fail immediately.
Set to 0 for no queue (direct hand-off) or to -1 for an unlimited
queue size (not recommended).
The pool is tuned for asynchronous execution by default. Apart from the connection parameters you should only have to set numThreads and queueSize in most cases. In this scenario there is contention over the thread pool (via its queue), not over the connections, so you can have a rather large limit on the maximum number of connections (based on what the database server can still handle, not what is most efficient). Slick will use more connections than there are threads in the pool when sequencing non-database actions inside a transaction.
The following config keys are supported for HikariCP:
url (String, required): JDBC URL
driver or driverClassName (String, optional): JDBC driver class to load user (String, optional)*: User name
password (String, optional): Password
isolation (String, optional): Transaction isolation level for new connections. Allowed values are: NONE, READ_COMMITTED,
READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE.
catalog (String, optional): Default catalog for new connections.
readOnly (Boolean, optional): Read Only flag for new connections.
properties (Map, optional): Properties to pass to the driver or DataSource.
dataSourceClass (String, optional): The name of the DataSource class provided by the JDBC driver. This is preferred over using
driver. Note that url is ignored when this key is set (You have to
use properties to configure the database connection instead).
maxConnections (Int, optional, default: numThreads * 5): The maximum number of connections in the pool.
minConnections (Int, optional, default: same as numThreads): The minimum number of connections to keep in the pool.
connectionTimeout (Duration, optional, default: 1s): The maximum time to wait before a call to getConnection is timed out. If this
time is exceeded without a connection becoming available, a
SQLException will be thrown. 1000ms is the minimum value.
validationTimeout (Duration, optional, default: 1s): The maximum amount of time that a connection will be tested for aliveness. 1000ms
is the minimum value.
idleTimeout (Duration, optional, default: 10min): The maximum amount of time that a connection is allowed to sit idle in the pool.
A value of 0 means that idle connections are never removed from the
pool.
maxLifetime (Duration, optional, default: 30min): The maximum
lifetime of a connection in the pool. When an idle connection reaches
this timeout, even if recently used, it will be retired from the
pool. A value of 0 indicates no maximum lifetime.
connectionInitSql (String, optional): A SQL statement that will be executed after every new connection creation before adding it to the
pool. If this SQL is not valid or throws an exception, it will be
treated as a connection failure and the standard retry logic will be
followed.
initializationFailFast (Boolean, optional, default: false):
Controls whether the pool will "fail fast" if the pool cannot be
seeded with initial connections successfully. If connections cannot
be created at pool startup time, a RuntimeException will be thrown.
This property has no effect if minConnections is 0.
leakDetectionThreshold (Duration, optional, default: 0): The amount of time that a connection can be out of the pool before a message is
logged indicating a possible connection leak. A value of 0 means leak
detection is disabled. Lowest acceptable value for enabling leak
detection is 10s.
connectionTestQuery (String, optional): A statement
that will be executed just before a connection is obtained from the
pool to validate that the connection to the database is still alive.
It is database dependent and should be a query that takes very little
processing by the database (e.g. "VALUES 1"). When not set, the JDBC4
Connection.isValid() method is used instead (which is usually
preferable).
registerMbeans (Boolean, optional, default: false): Whether or not JMX Management Beans ("MBeans") are registered.
Slick have very transparent configuration setting.Best practice for good performance, There is no thumb rule. It depends on your database(how many parallel connection provides) and your application. It is all about tuning between database & application.
I can't find any documentation for the node-postgres drive on setting the maximum connection pool size, or even finding out what it is if it's not configurable. Does anyone know how I can set the maximum number of connections, or what it is by default?
defaults are defined in node-postgres/lib/defaults https://github.com/brianc/node-postgres/blob/master/lib/defaults.js
poolSize is set to 10 by default, 0 will disable any pooling.
var pg = require('pg');
pg.defaults.poolSize = 20;
Note that the pool is only used when using the connect method, and not when initiating an instance of Client directly.
node.js is single threaded why want to have more then 1 connection to db per process ? Even when you will cluster node.js processes you should have 1 connection per process max. Else you are doing something wrong.