How to use "Try" Postgres Advisory Locks - postgresql

I am experiencing some unexpected (to me) behavior using pg_try_advisory_lock. I believe this may be connection pooling / timeout related.
pg_advisory_lock is working as expected. When I call the function and the desired lock is already in use, my application waits until the specified command timeout on the function call.
however, when i replace with pg_try_advisory_lock and instead check the result of this function (true / false) to determine if the lock was acquired some scenario is allowing multiple processes (single threaded .net core deployed to ECS) to acquire "true" on the same lock key at the same time.
in C# code, I have implemented within an IDisposable and make my call to release the lock and dispose of the underlying connection on disposal. This is the case both for my calls to pg_advisory_lock and pg_try_advisory_lock. all of the work that needs to be synchronized happens inside a using block.
my operating theory is that the settings around connection pooling / timeouts are at play here. since the try call doesnt block, the session context for the lock "disposes" at the postgres - perhaps as a result of the connection being idle(?).
if that is the cause, the simplest solution seems to be to disable any kind of pooling for the connections used in try locking. but since pooling is just a theory at this point, it seems a bit early to start targeting a specific solution.
any ideas what may be the cause?
Example of the C#
using (Api.Instance.Locking.TryAcquire(someKey1, someKey2, out var acquired))
{
if (acquired)
{
// do some locked work
}
}
Under the hood. TryAcquire is calling
select pg_try_advisory_lock as acquired from pg_try_advisory_lock(#key1,#key2)

This turned out to be kind of dumb. No changes to pooling where required.
I am using Dapper and NpgSql libraries. The NpgsqlConnection returns to a closed state after being used to .Query() unless the connection is explicitly opened.
This was impacting both my calls to try / blocking versions of the advisory lock calls albeit in a less adverse way in the blocking capacity.

Related

EFCore Cancellation in async overloads of IDbContextTransaction

I am currently developing something that relies heavily on locking a small table. I came to wonder what's the use of transaction.RollbackAsync(CancellationToken) (or analogous transaction.CommitAsync(CancellationToken)). Shouldn't a transaction be guaranteed to be rolled back / committed when called for? I don't really see much sense in cancelling such an operation. In a scenario where I acquire a full lock on the table with a transaction using IsolationLevel.Serializable and then cancel amid my following operations on the table eventually calling the rollback or commit would end up never being executed once the token already set IsCancellationRequested to true (or worse throws on cancellation).
So just for understanding, can somebody explain me why they even got that overload? Is it safe to use or does it make any sense to use them? Should I even consider commiting/rolling back a transaction using await? I mean the database for sure releases the lock and discards the transaction when disconnecting, but that probably takes longer than cleanly releasing it as intended.

Mongo cursorFinalizerEnabled performance effect

I'm using Spring Boot with Mongo 3.4 (in cluster with MongoS)
The mongo client options configuration has the option cursorFinalizerEnabled.
According to documentation, this flag allows to:
Mongo Template closes the cursors. Making this true, spawns a thread
on every new MongoClient.
Attempts to clean up DBCursors that are not
closed.
MongoClientOptions options = MongoClientOptions.builder()
.cursorFinalizerEnabled(false)
.build();
What is the best practice? true or false? performance effect?
The default value of cursorFinalizerEnabled is true (see MongoClientOptions). So, your MongoClient will spawn this thread (and apply this behaviour) unless you choose not to.
This feature provides a safety net for client code which is (or might be) casual about handling cursors. So, depending on how you treat your cursors it might be useful or it might be a no-op.
The standard advice is: if your client code ensures that the close method of DBCursor is always invoked then you can set this to false. Otherwise, just accept the default.
As for the performance implications; it's hard to measure that. If your client code does not leave any open, unused cursors then it's a no-op but if your client code does leave open, unused cursors then this flag will help to reduce the impact on shared resources. Spawning a single thread to run this harvester seems like a low cost so if you are at all unsure about how your client code handles cursors then it's worth enabling it.
And, of course, as with all performance questions; the most reliable way of detemining the performance effect (if any) is to test with and without this flag and then compare :)

"PSQLException: FATAL: sorry, too many clients already" error in integration tests with jOOQ & Spring Boot

There are already similar questions about this error and suggested solutions; e.g. increasing max_connections in postgresql.conf and / or adapting the max number of connections your app requests. However, my question is more specific to using jOOQ in a Spring Boot application.
I integrated jOOQ into my application as in the example on GitHub. Namely, I am using DataSourceConnectionProvider with TransactionAwareDataSourceProxy to handle database connections, and I inject the DSLContext in the classes that need it.
My application provides various web services to front-ends and I've never encountered that PSQLException on dev or test environments so far. I only started getting that error when running all integration tests (around 1000) locally. I don't expect some leak in handling the connection as Spring and jOOQ manage the resources; nevertheless that error got me worried if that would also happen on production.
Long story short, is there a better alternative to using DataSourceConnectionProvider to manage connections? Note that I already tried using DefaultConnectionProvider as well, and tried to make spring.datasource.max-active less than max_connections allowed by Postgres. Neither fixed my problem so far.
Since your question seems not to be about the generally best way to work with PostgreSQL connections / data sources, I'll answer the part about jOOQ and using its DataSourceConnectionProvider:
Using DataSourceConnectionProvider
There is no better alternative in general. In order to understand DataSourceConnectionProvider (the implementation), you have to understand ConnectionProvider (its specification). It is an SPI that jOOQ uses for two things:
to acquire() a connection prior to running a statement or a transaction
to release() a connection after running a statement (and possibly, fetching results) or a transaction
The DataSourceConnectionProvider does so by acquiring a connection from your DataSource through DataSource.getConnection() and by releasing it through Connection.close(). This is the most common way to interact with data sources, in order to let the DataSource implementation handle transaction and/or pooling semantics.
Whether this is a good idea in your case may depend on individual configurations that you have made. It generally is a good idea because you usually don't want to manually manage connection lifecycles.
Using DefaultConnectionProvider
This can certainly be done instead, in case of which jOOQ does not close() your connection for you, you'll do that yourself. I'm expecting this to have no effect in your particular case, as you'll implement the DataSourceConnectionProvider semantics manually using e.g.
try (Connection c = ds.getConnection()) {
// Implicitly using a DefaultConnectionProvider
DSL.using(c).select(...).fetch();
// Implicit call to c.close()
}
In other words: this is likely not a problem related to jOOQ, but to your data source.

What is the best practice in EF Core for using parallel async calls with an Injected DbContext?

I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://learn.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)
Now, I am looking into parallelizing database reads as an optimization using WhenAll
So instead of:
var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
I use:
var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();
This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:
var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();
Now when I call this from my controller, randomly I will get concurrency errors like:
System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.
The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/
In which he acknowledges this limitation:
an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
And offers a custom workaround with DbContextScope.
However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):
if you attempt to start multiple parallel tasks within the context of
a DbContextScope (e.g. by creating multiple threads or multiple TPL
Task), you will get into big trouble. This is because the ambient
DbContextScope will flow through all the threads your parallel tasks
are using.
His final point here leads me to my question:
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?
Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.
A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.
In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.
If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.
If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.
It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.
Here is what I tested:
Cloud Load test with VSTS # 200 users max for 4 minutes on a Standard Azure webapp.
Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.
Results for Test #1:
Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.
Results for Test #2:
Conclusion:
For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.
The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.
One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.

What are the options to use JDBC in a non-blocking way in Play?

I wonder what is the best (recommended, approved etc.) way to do non-blocking JDBC queries in Play! application using Play's connection pool (in Scala and to PostgreSQL if it matters)? I understand that JDBC is definitely blocking per se, but surely there are approaches to do the calls in separated threads (e.g. using futures or actors) to avoid blocking of the calling thread.
Suppose I decided to wrap the calls in futures, which execution context should I use, the Play's default one? Or it's better to create separated execution context for handling DB queries?
I know that there are some libraries for this like postgresql-async, but I really want to understand the mechanics :)
Suppose I decided to wrap the calls in futures, which execution context should I use, the Play's default one? Or it's better to create separated execution context for handling DB queries?
It is better to use separate execution context in this case. This way there will be no chance that your non-blocking jobs (most of the default Play's stuff) submitted to default execution context will be jammed by blocking JDBC calls in jobs you submit to the same execution context.
I suggest to read this (especially second part) to get a general idea of how you could deal with execution contexts in different situations (including case with blocking database queries), and then refer this to get more details on configuring your scenario in Play.
Suppose I decided to wrap the calls in futures, which execution
context should I use, the Play's default one?
If you do that, you gain nothing, it's like not using futures at all. Wrapping blocking calls in futures only helps you if you execute them in separate execution contexts.
In Play, you can basically choose between the following two approaches when dealing with blocking IO:
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always. Simple, but not the intention behind Play
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
See the docs: "Understanding Play thread pools"