Reconnection of Couchbase client after node wake up - memcached

I'm studying the following scenario: while get/set operations to Couchbase I shutting down node(power off on virtual machine). After that, I power on the machine and waiting for Couchbase node recovery. When node's status changing to "healthy" I expect that client reconnect and get/set operations continues. But sometimes reconnection of client occurs immediately, sometime doesn't occur within a few minutes.
So my question is:
Are there some configuration on server side, or on client side that guarantee a wholly reconnection of client?
I use JavaSDK.
A small addition:
Couchbase client is based on spymemcached client. If someone knows any hints with memcached, that could solve problem, I'll be very glad to see them.
Another addition:
Client stops trying to establish connection after this exception:
Exception in thread "Thread-122" java.lang.IllegalStateException: Got empty SASL auth mech list. 11:59:25,731 ERROR [stderr] (Thread-122) at net.spy.memcached.auth.AuthThread.listSupportedSASLMechanisms(AuthThread.java:99) 11:59:25,731 ERROR [stderr] (Thread-122) at net.spy.memcached.auth.AuthThread.run(AuthThread.java:112)
But I can't understand, why this exception happens so irregularly.

It's a small bug with response of CouchbaseServer. It's my discussion of this problem with developer of spymemcached client. Problem was solved when I hard-coded some string at this client.

Related

Application keeps crashing because of database

Our application keeps on crashing once per day (at the start of the workday). Because of what it seems connection with database.
[31merror[39m: [SSL-QTEH-TD] E01000-SYSTEM_ERROR: [IBM][CLI Driver]
SQL30081N A communication error has been detected. Communication
protocol being used: "TCP/IP". Communication API being used:
"SOCKETS". Location where the error was detected: "000.00.00.00".
Communication function detecting the error: "recv". Protocol specific
error code(s): "104", "*", "0". SQLSTATE=08001
I'm unable to determine why this is happening.
You have a communication related SQL Error at the start of each working day. This implies that the network connectivity between your application and the database server was broken overnight, most likely for a scheduled downtime.
This could have been one or more of your app, the server your app runs on, any proxy or firewall servers servers between your app and the database server, the database, the server the database runs on.
Most likely it will be the database, allowing it to run reorgs and make backups. Next likely is a firewall, shutting down to allow maintenance. In any case your app needs to be able to detect the disconnect and recover.

HornetQ local connection never timing out

My application, running in a JBOSS standalone env, relies on a HornetQ (v2.2.5.Final) middleware to exchange messages between parts of my application in a local environment - not over the network.
The default TTL (time-to-live) value for the connection is 60000ms, I am thinking of changing that to -1 since, from an operative point of view, I am looking forward to keep sending messages through such connection from time to time (not known in advance). Also, that would prevent issues like jms queue connection failure.
The question is: what are the issues with never timing out a connection on the server side, in such context? Is that a good choice? If not, is there a strategy that is suited for such situation?
The latest versions of HornetQ automatically disable connection checking for in-vm connections so there shouldn't be any issues if you configure this manually.

hornetq fails when we change system time

I have an issue and I hope you can help me a bit.
I have to implement fast forwarding time, because I need to test something. I've wrote a python script which increment the system time with 5 seconds for every 1 real second. (5 times faster).
Then my jboss fails with some hornetq timeouts.
Do you have any ideas how I can fix this?
03/09/18 09:18:00,107 WARN
[org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (hornetq-
failure-check-thread) Connection failure has been detected: Did not
receive data from invm:0. It is likely the client has exited or crashed
without closing its connection, or the network between the server and
client has failed. You also might have configured connection-ttl and
client-failure-check-period incorrectly. Please check user manual for
more information. The connection will now be closed. [code=3]
The underlying issue is that changing the time breaks the connection-failure-detection algorithm used by the broker. The broker thinks it isn't receiving "ping" packets from clients at the proper time because you're forcing time to pass at 5x the normal rate. There is no way to fix this for remote clients aside from disabling or extending the connection TTL. However, for in-vm connections you could apply the fix from https://issues.jboss.org/browse/HORNETQ-1314 (which is not resolved in the version of HornetQ you are using) to the branch of HornetQ you're currently using and rebuild. If you don't want to rebuild you could upgrade to a version of JBoss AS (or Wildfly) which contains this fix.

PeopleSoft Webserver crashing, losing connection to AppServer

On our Webserver, we're seeing a ton of these errors:
Application Server last connected //psoftapp.company.net_8850
bea.jolt.ServiceException: bea.jolt.JoltRemoteService(GetCertificate)call(): Timeout\nbea.jolt.SessionException: Connection recv error\nbea.jolt.JoltException: [3] NwHdlr.recv(): Timeout Error
and on our Appserver:
PSPUBDSP_dflt.27505 (0) 07/20/11 08:13:33 (JNIUTIL): Java exception thrown: java.net.SocketException: Connection reset
I'm reading some tuning documents from PeopleSoft & I found a suggestion that I've seen in a couple of places -- Reducing the tcp_wait_time_interval to 60 seconds. I think I sort of understand what this is doing - It seems that network (or socket?) connections that are no longer being used are "recycled" or made available? Can someone confirm this? Also, why are these connections unused/stale? Is it caused by people not properly logging out of the app (and just closing the browser)?
Thanks!
PSPUBDP is part of the Integration Broker application messaging framework. You could look at the Tuxedo logs or the Integration Broker Monitor too see what is going on. You may be running a high number of messages and overloading the server or possibly you have a message with errors that is somehow causing the crashes.

Dealing with intermittent Winsock errors

My client app gets intermittent winsock errors (10060, 10053) against one particular server we interface with. I have it re-trying the request that failed, but sometimes it fails repeatedly, and I give up after 5 re-tries. Would it be likely to help at all if I closed the socket and created a new one? (I know nothing about the server-side.)
Ok, so the errors that you're getting are:
10060 - WSAETIMEDOUT
10053 - WSAECONNABORTED
When do you get them? What are you doing at the time?
You get a WSAECONNABORTED when the remote end of the connection, or possibly an intermediary router, resets the connection and sends an RST. This could simply be the remote end issuing a non lingering close or it could be the remote end aborting or crashing.
You can't continue doing anything with a connection that has had a WSAECONNABORTED on it as the connection has been aborted and is no more; it is a dead connection, it has passed on...
Context matters immensely as to why you might get a WSAETIMEDOUT exception and the context will dictate if retrying is sensible or not.
One thing I would try is- do tracert to your server.
Often when someone is connected through VPN; you may see this error because your local and remote ip addresses overlap.
e.g. if your local ipaddress range is 192.168.1.xxx and vpn remote range is also 192.168.1.xxx you will also see this error.
sharrajesh