How to get Squeryl to release closed connections back to C3P0? - scala

Sometimes I see the following error repeatedly in my logs:
com.mchange.v2.c3p0.impl.NewPooledConnection - [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
com.mchange.v2.c3p0.impl.NewPooledConnection - [c3p0] Another error has occurred [ org.postgresql.util.PSQLException: This connection has been closed. ] which will not be reported to listeners!
org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:822)
at org.postgresql.jdbc2.AbstractJdbc2Connection.rollback(AbstractJdbc2Connection.java:839)
at com.mchange.v2.c3p0.impl.NewProxyConnection.rollback(NewProxyConnection.java:855)
at org.squeryl.dsl.QueryDsl$class._executeTransactionWithin(QueryDsl.scala:131)
at org.squeryl.dsl.QueryDsl$class.transaction(QueryDsl.scala:78)
at org.squeryl.PrimitiveTypeMode$.transaction(PrimitiveTypeMode.scala:40)
Below that is the trace of my own code up to the point of my transaction {} block.
My software repeats the transaction {} after it throws an exception, but it appears to use the same (closed) connection over again, so the next attempt also fails. Strangely this takes quite some time, sometimes 50 seconds, sometimes up to 2 minutes. One would think a closed connection would fail immediately.
How do I get Squeryl to release this connection to the pool and acquire a new one?

I found out why this was happening. I needed to use JDBC features not supported by Squeryl. So I got the connection from Squeryl and used it directly. But I had a bug where a prepared statement was not being closed. This resulted in the dead connection being reused over and over. I'm not sure how or why this happened. But as soon as I put the closing of the statement in a finally block everything started working. Now when Squeryl gets to the transaction block a second time it receives a fresh connection from c3p0.
For anyone else seeing the same errors, I also found out that you can get the c3p0 errors (top two errors in the question text above) even when nothing is wrong. If your thread holding the database connection is busy (a Thread.sleep() in my case, for testing) and c3p0 notices before you do that the connection is dead, then you can get the error about a dead connection still being in use. In that case it is a perfectly normal situation and just a question of which thread sees the problem first - nothing to worry about.

Related

Discrepancy between Redshift data api DescribeStatement status and console status

I'm loading data into redshift which usually takes about an hour when successful but seems to timeout randomly sometimes. I continue to get a "STARTED" status from DescribeStatement calls for my query but when I look in the console it says the query was ABORTED and rolled back via "Undoing 1 transactions on table ..." statement. But I'm not finding any errors in STL_LOAD_ERRORS related to the query or anything useful in STL_UTILITYTEXT for that transaction; though STL_UNDONE view does show the rollback.
I would've expected DescribeStatement to update with "FAILED" or "ABORTED" status when this occurred but that doesn't seem to be the case. Any idea what is causing the load to fail without any errors? Is there a way to catch/handle this via redshift data api? I'm currently thinking of checking STL_UNDONE after a specified time but was hoping there's a better solution.
Statement timeout seems like a likely cause. What you are describing sounds like the connection closed out from under the executing statement. There are a number of places where this timeout can come from but a common one is in the cluster configuration and the WLM configuration.
Another possibility is a network timeout. Database connections stay open for the entirety of the session but when a statement is in flight there is no activity on the connection. Some network equipment see this an assume that something is wrong and close the connection which closes the session which aborts the transaction in flight.
If your issue is caused by the connection closing you may be able to line things up in stl_sessions. There is info in there about timeouts but also you can see if the time the session closes is right when the query commands abort.
Just one area that could be causing your issue but is more common than people think.
So after escalating to AWS support, it was confirmed there was a bug on their end. Related to data API autoscaling protocols that were sometimes scaling down without waiting for outstanding tasks to complete. There's a temporary fix in place to avoid this happening while they implement a long term solution. Should hopefully be rolled out end of this month, June 2022.

Postgres: processes terminated after connetion break / invalidation

I don't understand some of Postgres mechanism and it makes me quite upset.
I usually use DBeaver as SQL client to query external pg base. If run create.. or insert.. queries and then connection for some reason is broken or invalidated, the pid is still running and finishes transaction.
But for some more complicated PL/pgSQL functions (with temp tables, loops, inserts, etc.) we wrote, breaking connection always causes process termination (it disappears from session list just before making next sql operation, eg. inserting a row in logtable). No matter if it's DBeaver editor or psql command.
I know that maybe disconnecting is critical problem, which should be eliminated and maybe I shouldn't expect process to successfully continue, but I do:) Or just to know why it happened and is it possible to prevent it?
If the network connection fails, the database server can detect that in two ways:
if it tries to send data to the client, it will figure out pretty quickly that the connection is down
if it tries to receive data from the client, it will only notice when the kernel's TCP keepalive mechanism has determined that the connection is down
When you say that sometimes execution of a function is terminated right away, I would say that is because the function returned data to the client.
In the case where a query keeps running, it is not attempting to return any data yet.
There is no cure for the former, but in PostgreSQL v14 you can prevent the latter by setting client_connection_check_interval. In addition, you have to set the PostgreSQL keepalive parameters so that the dead connection becomes known quickly.
See my article for more.

UpdateOne fails on client due to timeout, but MongoDB processes it anyway

One of my tests for a function that performs increments using the MongoDB driver for Go is randomly breaking in an unexpected way. Here's what the test does:
Create a proxy (with toxiproxy) to a local MongoDB instance.
Disable the proxy, so the database looks like it's down.
Run a function that does an update that increments a field, timing out after 100ms. If it fails, it keeps retrying every 100ms until the command succeeds.
Sleep 1 second.
Enable the proxy.
Wait for the function to complete and assert that the field has been incremented correctly - only once.
This test is randomly breaking because sometimes that field gets incremented twice. I noticed that it happens when an update is retried just as the proxy gets enabled: the client code receives an incomplete read of message header: context deadline exceeded error, which makes it retry the command, but the previous one indeed succeeded because the field ends up being incremented twice.
I took a look at the driver code and I guess it's timing out while reading the server response - perhaps the proxy is enabled just after the update has started and there isn't much timeout left for both write and read operations to complete.
Is there anything that I can do on my side to prevent this from happening? I tried to find a specific error to catch, but I couldn’t find any. Or is this something the driver itself is supposed to handle?
Any help is appreciated.
UPDATE: I looked closely at the error messages and noticed that, while the MongoDB instance was down, all errors were handshake failures. So I made sure the test ping the database before disabling the proxy to get the handshake out of the way and the test stopped randomly breaking; it ran 1000 times flawlessly, at least. I assume the handshake itself takes time to complete and that contributes to the command timeout.
In general, if you know the command went through (to the server), if you can't read the response, you can't assume anything about its success.
In some cases when it only matters if the server got the command, or you only care about the command reaching the server, then read on.
Unfortunately the current state of the driver (v1.7.1) is not "sophisticated" enough to easily tell if the error is from reading the response.
I was able to reproduce your issue locally. Here is the error when a timeout happens reading the response:
mongo.CommandError{Code:0, Message:"connection(localhost:27017[-30]) incomplete read of message header: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-30]", Wrapped:context.deadlineExceededError{}, init:false, message:"incomplete read of message header"}}
And there is the error when the timeout happens writing the command:
mongo.CommandError{Code:0, Message:"connection(localhost:27017[-31]) unable to write wire message to network: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-31]", Wrapped:context.deadlineExceededError{}, init:false, message:"unable to write wire message to network"}}
As you can see, in both cases mongo.CommandError is returned, with identical Code and Labels fields. Which leaves you having to analyze the error string (which is ugly and may "break" with future changes).
So the best you can do is check if the error string contains "incomplete read of message header", and if so, you don't have to retry. Hopefully this (error support and analysis) improves in the future.
If you are using the retryable writes as implemented by MongoDB 3.6+ and the respective drivers, this shouldn't happen. Each write is accompanied by a transaction number (not to be confused with client-side transactions as implemented by MongoDB 4.0+), and if the same transaction number is used in two consecutive writes there is only one write being done by the server.
This functionality has been around for years so unless you are using an ancient driver version you should already have it.
If you are performing write retries in your application manually rather than using the driver's retryable write functionality, you can write twice as you found out. The solution is to use the driver's retryable writes.
I had the same problem (running on go.mongodb.org/mongo-driver v1.8.1 on a MongoDB 4.4) and will leave my experiences with this problem here.
To add to #icza solution:
You can also get the error context deadline exceeded so check also for that.
A check for a context abortion would look something like this:
if strings.Contains(err.Error(), "context") && (strings.Contains(err.Error(), " canceled") || strings.Contains(err.Error(), " deadline exceeded")) {
...
}
My solution to the problem was instead of first checking if there was an error you'd first check if there was a result from the transaction.
Example:
result, err := database.collection.InsertOne(context, item)
if result != nil {
return result.InsertedID, err
}
return nil, err
If the transaction did process it despite the error, you could add some compensation logic to undo the transaction.

We are seeing below strange behaviour issue in our instance

We are seeing below strange behaviour issue in our instance-
when we are running our scheduled concurrent programs.
Instead of program going to the desired manager it goes to the different manager and then it errored out with the below error.
Error:
Routine CALL_STORED_PROCEDURE cannot initialize concurrent request
Routine AFPEOT cannot initialize concurrent request information
We checked the controlling manager process id in FND_CONCURRENT_REQUESTS table and we don't see any process it shows null.
when some of the concurrent programs for e.g. Customer Interface are assigned to the concurrent manager the actual process of manager are going down and when we are cancelling the program the actual processes are coming back up again.
The actual process keeps on varying from target process sometimes less and sometimes 0.
This is again the intermittent issue. We could not recognize any patter as such.
Tried bouncing database , application, services but nothing help.
Error:
ORA-600 in database

Unexpected behaviour after memcached server restarts. How to configure/rectify it?

I have a pool of persistent connections(Memcached clients). Data are being cached in the memcached server. If after restarting the memcached server, I try to get the cached data using the client from the pool, I m getting the below exception:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancelled
at net.spy.memcached.MemcachedClient$OperationFuture.get(MemcachedClient.java:1662)
at net.spy.memcached.MemcachedClient$GetFuture.get(MemcachedClient.java:1708)
at com.eos.gds.cache.CacheClient.get(CacheClient.java:49)
I get this exception only for the first time after the restart when I try to get the cached data. I did a lot of search. But unable to find the exact reason for this.
Spymemcached has a bunch of internal queues that operations are placed in before they are actually sent out to memcached. What is happening here is that you do an operation and then before that operation is sent over the wire or before a response is received from memcached, Spymemcached realizes that the connection has been lost. As a result Spymemcached cancels all operations in flight and then reestablishes the connection.
When you call get() on the Future then since the operation was cancelled by Spymemcached an exception is thrown. What I recommend doing here is catching all exceptions on every individual operation you do with Spymemcached and then, depending on the error, either retrying the operation of just forgetting about it. If it's a get for example and your cluster of memcached servers goes down then you can probably forget about it since the cache will be empty, but you will probably want to retry a set.
I ran into the exact same problem and fix it by handling the exception until success
while(true){
try{
memcacheclient.get(key);
break;
}
catch(java.util.concurrent.CancellationException e ){
log.info("cache cancelled");
}
}
Run MemcachedClient.getStats() for each new client once and that will fix the cancelations issue.
I had the same issue. I am using Spymemcached client to connect with Memcache server.
I found .
There must be a connection issue.
Ref: https://github.com/couchbase/spymemcached/blob/master/src/main/java/net/spy/memcached/internal/OperationFuture.java
Been searching for days for a solution. Posting in case it helps someone else.
Our implementation of ServletContextListener was getting a new MemcachedClient(...) on contextInitialized, and would then call the MemacachedClient method shutdown() on contextDestroyed. I would always get a CancellationException or ExecutionException on the first request I would send. (The error messaging alluded to both, but an ExecutionException is what I was able to catch.)
Solution: switched from shutdown() to shutdown(1, TimeUnit.SECONDS)
Now the get call succeeds the very first time that it is run.
I cannot explain for sure how the contextDestroyed call was interfering with the regular handling of the request. My best guess is that spymemcached's single thread somehow gets shared between servlets, and so when a servlet was created to handle a request sent by a verification step of our build process, it would get destroyed prior to the first request I would send, and the MemcachedClient my request's servlet was using would then try to use that same thread and get hit with the exceptions from the shutdown.
(Our team had established the need to call shutdown a while back when we learned our web app had too many open connections to our memcached server.)