hornetq fails when we change system time - jboss

I have an issue and I hope you can help me a bit.
I have to implement fast forwarding time, because I need to test something. I've wrote a python script which increment the system time with 5 seconds for every 1 real second. (5 times faster).
Then my jboss fails with some hornetq timeouts.
Do you have any ideas how I can fix this?
03/09/18 09:18:00,107 WARN
[org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (hornetq-
failure-check-thread) Connection failure has been detected: Did not
receive data from invm:0. It is likely the client has exited or crashed
without closing its connection, or the network between the server and
client has failed. You also might have configured connection-ttl and
client-failure-check-period incorrectly. Please check user manual for
more information. The connection will now be closed. [code=3]

The underlying issue is that changing the time breaks the connection-failure-detection algorithm used by the broker. The broker thinks it isn't receiving "ping" packets from clients at the proper time because you're forcing time to pass at 5x the normal rate. There is no way to fix this for remote clients aside from disabling or extending the connection TTL. However, for in-vm connections you could apply the fix from https://issues.jboss.org/browse/HORNETQ-1314 (which is not resolved in the version of HornetQ you are using) to the branch of HornetQ you're currently using and rebuild. If you don't want to rebuild you could upgrade to a version of JBoss AS (or Wildfly) which contains this fix.

Related

ADO.NET background pool validation

in Java, application servers like JBoss EAP have the option to periodically verify the connections in a database pool (https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6.4/html/administration_and_configuration_guide/sect-database_connection_validation). This has been very useful for removing stale connections.
I'm now looking at a ADO.NET application, and I was wondering if there was any similar functionality that could be used with a Microsoft SQL Server?
I ended up find this post by redgate that describes some of the validation that goes on when connections are taken from the pool:
If the connection has died because a router has decided that it no
longer wants to forward your packets and no other routers like you
either then there is no way to know this unless you try to send some
data and don’t get a response.
If you create a connection and a connection pool is created and
connections are put into the pool and not used, the longer they are in
there, the bigger the chance of something bad happening to it.
When you go to use a connection there is nothing to warn you that a
router has stopped forwarding your packets until you go to use it; so
until you use it, you do not know that there is a problem.
This was an issue with connection pooling that was fixed in the first
.Net 4 reliability update (see issue 14 which vaguely describes this)
with a feature called “Connection Pool Resiliency”. The update meant
that when a connection is about to be taken from the pool, it is
checked for TCP validity and only returned if it is in a good state.

HornetQ local connection never timing out

My application, running in a JBOSS standalone env, relies on a HornetQ (v2.2.5.Final) middleware to exchange messages between parts of my application in a local environment - not over the network.
The default TTL (time-to-live) value for the connection is 60000ms, I am thinking of changing that to -1 since, from an operative point of view, I am looking forward to keep sending messages through such connection from time to time (not known in advance). Also, that would prevent issues like jms queue connection failure.
The question is: what are the issues with never timing out a connection on the server side, in such context? Is that a good choice? If not, is there a strategy that is suited for such situation?
The latest versions of HornetQ automatically disable connection checking for in-vm connections so there shouldn't be any issues if you configure this manually.

PostgreSQL 10 error: server closed the connection unexpectedly

When I run a query that takes a long time on my Postgres server (maybe 30 minutes), I get the error. I've verified the query is running with active status on the server using pgAdmin. I've also verified the correctness of the query, as it runs successfully on a smaller dataset. Server configurations are default, I haven't changed anything. Please help!
Look into the PostgreSQL server log.
Either you'll find a crash report there, which would explain the broken connection, or there is something in your network that cuts connections with no activity after a while.
Investigate your firewalls!
Maybe it is a solution to set the configuration parameter tcp_keepalives_idle to a value shorter than the time when the connection is cut. That will cause the server operating system to send keepalive messages on idle connections, which may be enough to prevent the overzealous connection reaper in your environment from disrupting your work.

TCP keepalive not working

The situation:
Postgres 9.1 on Debian Server
Scala(Java) application using the LISTEN/NOTIFY mechanism to get notified through JDBC
As there can be very long pauses (multipla days) between notifications I ran into the problem that the underlying TCP connection silently got terminated after some time and my application stopped to receive the notifications.
When googeling for a solution I found that there is a parameter tcpKeepAlive that you can set on the connection. So I set it to true and was happy. Until the next day I saw that again my connection was dead.
As I had been suspicious there was a wireshark capture running in parallel which now turns out to be very usefull. Just about exactly two hours after the last successfull communication on the connection of interest my application sends a keepalive packet to the database server. However the server responds with RST as it seems it has already closed the connection.
The net.ipv4.tcp_keepalive_time on the server is set to 7200 which is 2 hours.
Do I need to somehow enable keepalive on the server or increase the keepalive_time?
Is this the way to go about keeping my application connected?
TL;DR: My database connection gets terminated after long inactivity. Setting tcpKeepAlive didnt fix it as server responds with RST. What to do?
As Craig suggested in the comments the problem was very likely related to some piece of network hardware in between the server and the application. The fix was to increase the frequency of the keepalive messages.
In my case the OS was Windows where you have to create a Registry key with the idle time in milliseconds after which the message should be sent. Info on that here
I have set it to 15 minutes which seems to have solved the issue.
UPDATE:
It only seemed like it solved the issue. After about two days of program run time my connection was gone again. I switched to checking the validity my connection every time I use it. This does not seem like it is the solution but it is a solution nonetheless.

PeopleSoft Webserver crashing, losing connection to AppServer

On our Webserver, we're seeing a ton of these errors:
Application Server last connected //psoftapp.company.net_8850
bea.jolt.ServiceException: bea.jolt.JoltRemoteService(GetCertificate)call(): Timeout\nbea.jolt.SessionException: Connection recv error\nbea.jolt.JoltException: [3] NwHdlr.recv(): Timeout Error
and on our Appserver:
PSPUBDSP_dflt.27505 (0) 07/20/11 08:13:33 (JNIUTIL): Java exception thrown: java.net.SocketException: Connection reset
I'm reading some tuning documents from PeopleSoft & I found a suggestion that I've seen in a couple of places -- Reducing the tcp_wait_time_interval to 60 seconds. I think I sort of understand what this is doing - It seems that network (or socket?) connections that are no longer being used are "recycled" or made available? Can someone confirm this? Also, why are these connections unused/stale? Is it caused by people not properly logging out of the app (and just closing the browser)?
Thanks!
PSPUBDP is part of the Integration Broker application messaging framework. You could look at the Tuxedo logs or the Integration Broker Monitor too see what is going on. You may be running a high number of messages and overloading the server or possibly you have a message with errors that is somehow causing the crashes.