"PSQLException: FATAL: sorry, too many clients already" error in integration tests with jOOQ & Spring Boot - postgresql

There are already similar questions about this error and suggested solutions; e.g. increasing max_connections in postgresql.conf and / or adapting the max number of connections your app requests. However, my question is more specific to using jOOQ in a Spring Boot application.
I integrated jOOQ into my application as in the example on GitHub. Namely, I am using DataSourceConnectionProvider with TransactionAwareDataSourceProxy to handle database connections, and I inject the DSLContext in the classes that need it.
My application provides various web services to front-ends and I've never encountered that PSQLException on dev or test environments so far. I only started getting that error when running all integration tests (around 1000) locally. I don't expect some leak in handling the connection as Spring and jOOQ manage the resources; nevertheless that error got me worried if that would also happen on production.
Long story short, is there a better alternative to using DataSourceConnectionProvider to manage connections? Note that I already tried using DefaultConnectionProvider as well, and tried to make spring.datasource.max-active less than max_connections allowed by Postgres. Neither fixed my problem so far.

Since your question seems not to be about the generally best way to work with PostgreSQL connections / data sources, I'll answer the part about jOOQ and using its DataSourceConnectionProvider:
Using DataSourceConnectionProvider
There is no better alternative in general. In order to understand DataSourceConnectionProvider (the implementation), you have to understand ConnectionProvider (its specification). It is an SPI that jOOQ uses for two things:
to acquire() a connection prior to running a statement or a transaction
to release() a connection after running a statement (and possibly, fetching results) or a transaction
The DataSourceConnectionProvider does so by acquiring a connection from your DataSource through DataSource.getConnection() and by releasing it through Connection.close(). This is the most common way to interact with data sources, in order to let the DataSource implementation handle transaction and/or pooling semantics.
Whether this is a good idea in your case may depend on individual configurations that you have made. It generally is a good idea because you usually don't want to manually manage connection lifecycles.
Using DefaultConnectionProvider
This can certainly be done instead, in case of which jOOQ does not close() your connection for you, you'll do that yourself. I'm expecting this to have no effect in your particular case, as you'll implement the DataSourceConnectionProvider semantics manually using e.g.
try (Connection c = ds.getConnection()) {
// Implicitly using a DefaultConnectionProvider
DSL.using(c).select(...).fetch();
// Implicit call to c.close()
}
In other words: this is likely not a problem related to jOOQ, but to your data source.

Related

Postgres SET configuration variables with TypeORM, how to persist variable during the life of the connection between calls

I have Express server with TypeORM and I use Row Security Policies https://www.postgresql.org/docs/current/ddl-rowsecurity.html. I have issue with postgres configuration settings, because postgres uses connection pool under the hood and when I set configuration values on default connection I couldn't be sure it persist during life of all calls. I have considered few approaches on implement row security policies in TypeORM:
Avallone library https://github.com/Avallone-io/rls
Own implementation
EventSubscribers https://orkhan.gitbook.io/typeorm/docs/listeners-and-subscribers
Avallone has too many new dependencies.
EventSubscribers seems to be one of the approaches with less code except I really depend on using something like #BeforeLoad (as we have https://orkhan.gitbook.io/typeorm/docs/listeners-and-subscribers#afterload). And my question is how can we achieve this?
I've seen the similar discussion here Postgres SET runtime variables with TypeORM, how to persist variable during the life of the connection between calls

How to use "Try" Postgres Advisory Locks

I am experiencing some unexpected (to me) behavior using pg_try_advisory_lock. I believe this may be connection pooling / timeout related.
pg_advisory_lock is working as expected. When I call the function and the desired lock is already in use, my application waits until the specified command timeout on the function call.
however, when i replace with pg_try_advisory_lock and instead check the result of this function (true / false) to determine if the lock was acquired some scenario is allowing multiple processes (single threaded .net core deployed to ECS) to acquire "true" on the same lock key at the same time.
in C# code, I have implemented within an IDisposable and make my call to release the lock and dispose of the underlying connection on disposal. This is the case both for my calls to pg_advisory_lock and pg_try_advisory_lock. all of the work that needs to be synchronized happens inside a using block.
my operating theory is that the settings around connection pooling / timeouts are at play here. since the try call doesnt block, the session context for the lock "disposes" at the postgres - perhaps as a result of the connection being idle(?).
if that is the cause, the simplest solution seems to be to disable any kind of pooling for the connections used in try locking. but since pooling is just a theory at this point, it seems a bit early to start targeting a specific solution.
any ideas what may be the cause?
Example of the C#
using (Api.Instance.Locking.TryAcquire(someKey1, someKey2, out var acquired))
{
if (acquired)
{
// do some locked work
}
}
Under the hood. TryAcquire is calling
select pg_try_advisory_lock as acquired from pg_try_advisory_lock(#key1,#key2)
This turned out to be kind of dumb. No changes to pooling where required.
I am using Dapper and NpgSql libraries. The NpgsqlConnection returns to a closed state after being used to .Query() unless the connection is explicitly opened.
This was impacting both my calls to try / blocking versions of the advisory lock calls albeit in a less adverse way in the blocking capacity.

How do I disable legacy application from using XA datasources?

I have this legacy application, that often fails importing data, probably because some transactions spanning too many sql statements. These long transactions are really not needed, so I'm trying to get rid of them and just use normal lookup and commits.
I'm not very familiar with XA datasources and don't really understand what controls if an XA or non XA is used. I have found places in the code that chooses between XA and non XA, but after setting this to always use non XA, I'm still getting the errors.
I have also un-checked the "Support two phase commit protocol" in "Queue connection factories" on my server, also without luck.
My server have datasources registered for both XA and non XA.
Any help on how and where to disable the use of XA datasources would be appreciated.
LocalTransact E J2CA0030E: Method enlist caught com.ibm.ws.Transaction.IllegalResourceIn2PCTransactionException: Illegal attempt to enlist multiple 1PC XAResources
at com.ibm.ws.tx.jta.RegisteredResources.enlistResource(RegisteredResources.java:871)
at com.ibm.ws.tx.jta.TransactionImpl.enlistResource(TransactionImpl.java:1835)
at com.ibm.tx.jta.embeddable.impl.EmbeddableTranManagerSet.enlistOnePhase(EmbeddableTranManagerSet.java:202)
at com.ibm.ejs.j2c.LocalTransactionWrapper.enlist(LocalTransactionWrapper.java:624)
at com.ibm.ejs.j2c.ConnectionManager.lazyEnlist(ConnectionManager.java:2697)
at com.ibm.ws.rsadapter.spi.WSRdbManagedConnectionImpl.lazyEnlist(WSRdbManagedConnectionImpl.java:2605)
at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.beginTransactionIfNecessary(WSJdbcConnection.java:743)
at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.prepareStatement(WSJdbcConnection.java:2792)
at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.prepareStatement(WSJdbcConnection.java:2745)
Before answering this, I want to point out that changing transactional logic without full awareness of what you are doing can put your application at risk of data integrity issues, so proceed with caution.
If you look at the part of the stack that follows what you posted, it should show which application code is using the java.sql.Connection object. Follow the code back to point where it obtains the Connection from a DataSource, and identify the JNDI name of the DataSource that it is using. Switch your code to instead use the JNDI name of a ConnectionPoolDataSource (non-XA) rather than an XADataSource. Once you do this, you might see errors about enlisting multiple one-phase resources in a transaction. If so, your application was relying on two-phase commit which is only possible with XA and you will need to completely refactor it (if even possible at all) to avoid the use of two-phase commit. Alternately, if it was truly the intent that this data source should not be enlisting in JTA transactions, then you can mark it as transactional=false (if using Liberty) or nonTransactionalDataSource=true (WAS traditional) in which case it will avoid enlisting in JTA transactions and thus will not participate as a two-phase (XA) resource.
Before you make changes that you do not understand, you might be better advised to assess whether simply fixing or avoiding the (unspecified) errors may be less risky and less work that changing from XA to Non-XA behaviour.
At the very least for such a change from XA to non-XA you should engage a subject matter expert who can advise on the technical and business impacts of such a change specifically for the application involved.
You should edit your question to specify the exact errors (for example, sqlcodes or sqlstates) that the application receives in response to which kind of SQL actions. Sometimes simple low risk configuration changes can resolve those errors.

What is the best practice in EF Core for using parallel async calls with an Injected DbContext?

I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://learn.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)
Now, I am looking into parallelizing database reads as an optimization using WhenAll
So instead of:
var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
I use:
var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();
This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:
var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();
Now when I call this from my controller, randomly I will get concurrency errors like:
System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.
The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/
In which he acknowledges this limitation:
an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
And offers a custom workaround with DbContextScope.
However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):
if you attempt to start multiple parallel tasks within the context of
a DbContextScope (e.g. by creating multiple threads or multiple TPL
Task), you will get into big trouble. This is because the ambient
DbContextScope will flow through all the threads your parallel tasks
are using.
His final point here leads me to my question:
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?
Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.
A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.
In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.
If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.
If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.
It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.
Here is what I tested:
Cloud Load test with VSTS # 200 users max for 4 minutes on a Standard Azure webapp.
Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.
Results for Test #1:
Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.
Results for Test #2:
Conclusion:
For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.
The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.
One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.

How to initialize EclipseLink connection

Sorry if my question is quite simple, but I really couldn't find an answer googling. I have this project with JPA 2.0 (EclipseLink), it's working fine but I want to ask if there's a way to initialize the database connection?
Actually it begins whenever the user try to access any module that requires any query, which is quite annoying because the connection can take some seconds and the app froze for a second when it's connecting.
I can make any random query on main method for "turn it on", but it's an unnecesary query and not the solution I want to use.
Thanks beforehand!
The problem will be that the deployment process is lazy. This prevents the cost of initializing and connecting to unused/unneeded persistence units, but it means everything with a persistence unit is processed the very first time it is accessed.
This can be configured on a persistence unit by using the "eclipselink.deploy-on-startup" property:
http://www.eclipse.org/eclipselink/documentation/2.4/jpa/extensions/p_deploy_on_startup.htm
Not sure if this is what you are looking for, I found the existence of property eclipselink.jdbc.exclusive-connection.is-lazy, which defaults to true.
According to Javadoc, "property specifies when write connection is acquired lazily".