I am working on an microservice that is developed using Vertx framework. One of service receives 100s(400 average) of events per second from event bus and it is written to MSSQL DB. Queries are executed using JDBCPool and currently configured with 40 as max number of connection. C3P0 is used for connection pooling.
My problem is, sometime the pool got exhausted and there are lot of statements waiting to be executed and this make whole application un-responsive. If I increase the pool size the DB exhibit slowness which affects other services also. So I am planning to write event to a Queue and poll the event from queue and then write to db, by this way I can control number of connections to DB by increasing/decreasing the number poller instance.
Current design
Source system -> Event bus -> Async IO -> DB
Proposed design
Source system -> Event bus -> Queue -> Polling -> DB
To keep it simple, I am trying to replace the DB IO async part with kind of sync.
Code
poll(){
// poll queue with 100ms timeout
//after getting event from queue call dao
dao.insert(event)
.onSuccess(statValObj -> {
poll();
})
}
The above looks like a recursion, so would it impact vertx eventloop?
Can connection pool size limit the number of queries/operation without freezing the entire call stack?
Vertx version - 4.2.0
I am using Atomkios XA configuration for Oracle.
This is my code for creating datasource connection.
OracleXADataSource oracleXADataSource = new OracleXADataSource();
oracleXADataSource.setURL(sourceURL);
oracleXADataSource.setUser(UN);
oracleXADataSource.setPassword(PS);
AtomikosDataSourceBean sourceBean= new AtomikosDataSourceBean();
sourceBean.setXaDataSource(oracleXADataSource);
sourceBean.setUniqueResourceName(resourceName);
sourceBean.setMaxPoolSize(max-pool-size); // 10
atomikos:
datasource:
resourceName: insight
max-pool-size: 10
min-pool-size: 3
transaction-manager-id: insight-services-tm
This is my configuration is fine for medium user load around 5000 requests.
But when user count increases assume more than 10000 requests, this class com.atomikos.jdbc.AbstractDataSourceBean:getConnection consuming more then than normal.
This class time taking approximately 1500ms but normal time it takes less then 10ms. I can understand user demand increases, getConnection is going to wait state to pick the free connection from the connection pool so if I increase my max-pool-size, will my problem sort out or any other option feature available to sort out my problem.
Try setting concurrentConnectionValidation=true on your datasource.
If that does not help then consider a free trial of the commercial Atomikos product:
https://www.atomikos.com/Main/ExtremeTransactionsFreeTrial
Best
I currently use two Azure functions. One to receive webhook requests and add them to a Azure service bus queue. Another to process the queue. The second function reads from and then writes to a MongoDB Atlas database.
My queue processing Azure function app does cache the MongoDB client, so when each function host executes the script, it will reuse the connection if possible. However, presumably Azure function is creating new instances under the load. For reference, here is the caching code:
const mongodb = require('mongodb');
const MongoClient = mongodb.MongoClient;
const uri = process.env["MONGODB_URI"];
let dbInstance;
module.exports = async function() {
if (!dbInstance) {
client = await MongoClient.connect(uri);
dbInstance = client.db();
}
return dbInstance;
};
Yesterday, I had an Atlas email notification stating I was nearing the connection limit. Here is the connection spike:
As you can see, it nears my MongoDB Atlas limit of 500 connections.
Is there anyway to terminate these zombie connections, or perhaps reduce the connection TTL?
Alternatively, would it just make more sense to run this queue processor on a traditional server that polls the queue forever? I am currently dealing with ~500 executions a minute, and I simply assumed serverless would be much more scalable. But I am beginning to think a traditional server could handle that load without carrying the risk of overusing DB connections.
I have a basic pgbouncer configuration set up on an Amazon EC2 instance.
My client code (an AWS Lambda function, or a localhost webserver when developing) is making SQL queries to my database through the pgbouncer.
Currently, each query is taking 150-200ms to execute, with about 80% of that being the time it takes to get the connection.
Here's how I'm getting a connection:
long start = System.currentTimeMillis();
Connection conn = DriverManager.getConnection(this.url, this.username, this.password);
log.info("Got connection in " + (System.currentTimeMillis() - start) + "ms")
this.url is simply the location of the pgbouncer instance. Here's what the measured latency looks like, where Got connection is from the above code snippet and Executed in is another timing that measures the elapsed duration after a PreparedStatement has been executed. The first connection is usually a bit slow which is fine, subsequent ones take around 100ms pretty consistently.
DBManager - Got connection in 190ms
DBManager - Executed in 232ms
DBManager - Got connection in 108ms
DBManager - Executed in 132ms
DBManager - Got connection in 108ms
DBManager - Executed in 128ms
Is there any way to make this faster? Or am I basically stuck with a minimum ~100ms latency on my requests? I get similar speeds from Lambda and localhost, and unfortunately I can't throw my Lambda into the same VPC because of the occasional 8-10 second cold start delay from setting up a new Elastic Network Interface when using a Lambda in a VPC.
This is my first time working with this kind of setup so I don't really know where to start. Could I squeeze out higher speed by adding more power (RAM/CPU) to the database or pgbouncer? Should I not get a new connection for every request (but this would mean having a connection pool per Lambda and then a separate pgbouncer pool)?
I feel like this is surely a pretty common problem so there must be some good ways of solving it, but I haven't been able to find anything.
You'd have to ask the vendor to figure out what part of the time is spent on the route between you and pgBouncer and pgBouncer and the database server. I'd guess it is the first part.
If you want low latency, a hosted database might not be perfect for you.
My suggestion would be to build a connection pool into your application or have pgBouncer locally, so that you don't have to estabish connections to the hosted systems all the time.
I have an Azure Durable Function that interacts with a PostgreSQL database, also hosted in Azure.
The PostgreSQL database has a connection limit of 50, and furthermore, my connection string limits the connection pool size to 40, leaving space for super user / admin connections.
Nonetheless, under some loads I get the error
53300: remaining connection slots are reserved for non-replication superuser connections
This documentation from Microsoft seemed relevant, but it doesn't seem like I can make a static client, and, as it mentions,
because you can still run out of connections, you should optimize connections to the database.
I have this method
private IDbConnection GetConnection()
{
return new NpgsqlConnection(Environment.GetEnvironmentVariable("PostgresConnectionString"));
}
and when I want to interact with PostgreSQL I do like this
using (var connection = GetConnection())
{
connection.Open();
return await connection.QuerySingleAsync<int>(settings.Query().Insert, settings);
}
So I am creating (and disposing) lots of NpgsqlConnection objects, but according to this, that should be fine because connection pooling is handled behind the scenes. But there may be something about Azure Functions that invalidates this thinking.
I have noticed that I end up with a lot of idle connections (from pgAdmin):
Based on that I've tried fiddling with Npgsql connection parameters like Connection Idle Lifetime, Timeout, and Pooling, but the problem of too many connections seems to persist to one degree or another. Additionally I've tried limiting the number of concurrent orchestrator and activity functions (see this doc), but that seems to partially defeat the purpose of Azure Functions being scalable. It does help - I get fewer of the too many connections error). Presumably If I keep testing it with lower numbers I may even eliminate it, but again, that seems like it defeats the point, and there may be another solution.
How can I use PostgreSQL with Azure Functions without maxing out connections?
I don't have a good solution, but I think I have the explanation for why this happens.
Why is Azure Function App maxing out connections?
Even though you specify a limit of 40 for the pool size, it is only honored on one instance of the function app. Note that that a function app can scale out based on load. It can process several requests concurrently in the same function app instance, plus it can also create new instances of the app. Concurrent requests in the same instance will honor the pool size setting. But in the case of multiple instances, each instance ends up using a pool size of 40.
Even the concurrency throttles in durable functions don't solve this issue, because they only throttle within a single instance, not across instances.
How can I use PostgreSQL with Azure Functions without maxing out connections?
Unfortunately, function app doesn't provide a native way to do this. Note that the connection pool size is not managed by the function runtime, but by npgsql's library code. This library code running on different instances can't talk to each other.
Note that, this is the classic problem of using shared resources. You have 50 of these resources in this case. The most effective way to support more consumers would be to reduce the time each consumer uses the resource. Reducing the Connection Idle Lifetime substantially is probably the most effective way. Increasing Timeout does help reduce errors (and is a good choice), but it doesn't increase the throughput. It just smooths out the load. Reducing Maximum Pool size is also good.
Think of it in terms of locks on a shared resource. You would want to take the lock for the minimal amount of time. When a connection is opened, it's a lock on one of the 50 total connections. In general, SQL libraries do pooling, and keep the connection open to save the initial setup time that is involved in each new connection. However, if this is limiting the concurrency, then it's best to kill idle connections asap. In a single instance of an app, the library does this automatically when max pool size is reached. But in multiple instances, it can't kill another instance's connections.
One thing to note is that reducing Maximum Pool Size doesn't necessarily limit the concurrency of your app. In most cases, it just decreases the number of idle connections - at the cost of - paying the initial setup time when a new connection will need to be established at a later time.
Update
WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT might be useful. You can set this to 5, and pool size to 8, or similar. I would go this way if reducing Maximum Pool Size and Connection Idle Lifetime is not helping.
This is where Dependency Injection can be really helpful. You can create a singleton client and it will do the job perfectly. If you want to know more about service lifetimes you can read it here in docs
First add this nuget Microsoft.Azure.Functions.Extensions.DependencyInjection
Now add a new class like below and resolve your client.
[assembly: FunctionsStartup(typeof(Kovai.Serverless360.Functions.Startup))]
namespace MyFunction
{
class Startup : FunctionsStartup
{
public override void Configure(IFunctionsHostBuilder builder)
{
ResolveDependencies(builder);
}
}
public void ResolveDependencies(IFunctionsHostBuilder builder)
{
var conStr = Environment.GetEnvironmentVariable("PostgresConnectionString");
builder.Services.AddSingleton((s) =>
{
return new NpgsqlConnection(conStr);
}
}
}
Now you can easily consume it from any of your function
public FunctionA
{
private readonly NpgsqlConnection _connection;
public FunctionA(NpgsqlConnection conn)
{
_connection = conn;
}
public async Task<HttpResponseMessage> Run()
{
//do something with your _connection
}
}
Here's an example of using a static HttpClient, something which you should consider so that you don't need to explicitly manage connections, rather allow your client to do it:
public static class PeriodicHealthCheckFunction
{
private static HttpClient _httpClient = new HttpClient();
[FunctionName("PeriodicHealthCheckFunction")]
public static async Task Run(
[TimerTrigger("0 */5 * * * *")]TimerInfo healthCheckTimer,
ILogger log)
{
string status = await _httpClient.GetStringAsync("https://localhost:5001/healthcheck");
log.LogInformation($"Health check performed at: {DateTime.UtcNow} | Status: {status}");
}
}