PeopleSoft Webserver crashing, losing connection to AppServer - sockets

On our Webserver, we're seeing a ton of these errors:
Application Server last connected //psoftapp.company.net_8850
bea.jolt.ServiceException: bea.jolt.JoltRemoteService(GetCertificate)call(): Timeout\nbea.jolt.SessionException: Connection recv error\nbea.jolt.JoltException: [3] NwHdlr.recv(): Timeout Error
and on our Appserver:
PSPUBDSP_dflt.27505 (0) 07/20/11 08:13:33 (JNIUTIL): Java exception thrown: java.net.SocketException: Connection reset
I'm reading some tuning documents from PeopleSoft & I found a suggestion that I've seen in a couple of places -- Reducing the tcp_wait_time_interval to 60 seconds. I think I sort of understand what this is doing - It seems that network (or socket?) connections that are no longer being used are "recycled" or made available? Can someone confirm this? Also, why are these connections unused/stale? Is it caused by people not properly logging out of the app (and just closing the browser)?
Thanks!

PSPUBDP is part of the Integration Broker application messaging framework. You could look at the Tuxedo logs or the Integration Broker Monitor too see what is going on. You may be running a high number of messages and overloading the server or possibly you have a message with errors that is somehow causing the crashes.

Related

ADO.NET background pool validation

in Java, application servers like JBoss EAP have the option to periodically verify the connections in a database pool (https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6.4/html/administration_and_configuration_guide/sect-database_connection_validation). This has been very useful for removing stale connections.
I'm now looking at a ADO.NET application, and I was wondering if there was any similar functionality that could be used with a Microsoft SQL Server?
I ended up find this post by redgate that describes some of the validation that goes on when connections are taken from the pool:
If the connection has died because a router has decided that it no
longer wants to forward your packets and no other routers like you
either then there is no way to know this unless you try to send some
data and don’t get a response.
If you create a connection and a connection pool is created and
connections are put into the pool and not used, the longer they are in
there, the bigger the chance of something bad happening to it.
When you go to use a connection there is nothing to warn you that a
router has stopped forwarding your packets until you go to use it; so
until you use it, you do not know that there is a problem.
This was an issue with connection pooling that was fixed in the first
.Net 4 reliability update (see issue 14 which vaguely describes this)
with a feature called “Connection Pool Resiliency”. The update meant
that when a connection is about to be taken from the pool, it is
checked for TCP validity and only returned if it is in a good state.

hornetq fails when we change system time

I have an issue and I hope you can help me a bit.
I have to implement fast forwarding time, because I need to test something. I've wrote a python script which increment the system time with 5 seconds for every 1 real second. (5 times faster).
Then my jboss fails with some hornetq timeouts.
Do you have any ideas how I can fix this?
03/09/18 09:18:00,107 WARN
[org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (hornetq-
failure-check-thread) Connection failure has been detected: Did not
receive data from invm:0. It is likely the client has exited or crashed
without closing its connection, or the network between the server and
client has failed. You also might have configured connection-ttl and
client-failure-check-period incorrectly. Please check user manual for
more information. The connection will now be closed. [code=3]
The underlying issue is that changing the time breaks the connection-failure-detection algorithm used by the broker. The broker thinks it isn't receiving "ping" packets from clients at the proper time because you're forcing time to pass at 5x the normal rate. There is no way to fix this for remote clients aside from disabling or extending the connection TTL. However, for in-vm connections you could apply the fix from https://issues.jboss.org/browse/HORNETQ-1314 (which is not resolved in the version of HornetQ you are using) to the branch of HornetQ you're currently using and rebuild. If you don't want to rebuild you could upgrade to a version of JBoss AS (or Wildfly) which contains this fix.

TCP keepalive not working

The situation:
Postgres 9.1 on Debian Server
Scala(Java) application using the LISTEN/NOTIFY mechanism to get notified through JDBC
As there can be very long pauses (multipla days) between notifications I ran into the problem that the underlying TCP connection silently got terminated after some time and my application stopped to receive the notifications.
When googeling for a solution I found that there is a parameter tcpKeepAlive that you can set on the connection. So I set it to true and was happy. Until the next day I saw that again my connection was dead.
As I had been suspicious there was a wireshark capture running in parallel which now turns out to be very usefull. Just about exactly two hours after the last successfull communication on the connection of interest my application sends a keepalive packet to the database server. However the server responds with RST as it seems it has already closed the connection.
The net.ipv4.tcp_keepalive_time on the server is set to 7200 which is 2 hours.
Do I need to somehow enable keepalive on the server or increase the keepalive_time?
Is this the way to go about keeping my application connected?
TL;DR: My database connection gets terminated after long inactivity. Setting tcpKeepAlive didnt fix it as server responds with RST. What to do?
As Craig suggested in the comments the problem was very likely related to some piece of network hardware in between the server and the application. The fix was to increase the frequency of the keepalive messages.
In my case the OS was Windows where you have to create a Registry key with the idle time in milliseconds after which the message should be sent. Info on that here
I have set it to 15 minutes which seems to have solved the issue.
UPDATE:
It only seemed like it solved the issue. After about two days of program run time my connection was gone again. I switched to checking the validity my connection every time I use it. This does not seem like it is the solution but it is a solution nonetheless.

My netty TCP/IP server stops listenning few hours after starting

I have written TCP/IP server using Netty4.0 running on a Linux machine listening to small GPS tracking devices. I have been facing weird problem, which is server stops listening to them in a sudden several hours after I starts it. There is any error log I can see and still server is running. It looks like only channel is not working. When I run a client to do health check, the client socket is still alive and keep sending packet to the server but server does not get it.
If you have any idea how to solve it, please tell me about it. It would be appreciated.
It is impossible to tell without more informations. I would check different things like if there was an OOM exception or with telnet if the server really refuse connections etc. Also jstack may show you if there is some deadlock etc.

Session getting disconnected in the middle of working

Sessions are getting disconnected automatically (in the middle of working).
Disconnection happens for the users when they working by using telnet connection to Linux server via putty telnet application.
During the disconnection, the Network b/w utilization is high and no limitation for total number of users in a network.
Error "Hangup signal received (562)"
Any idea about this ??
The network connection was interrupted or a hangup signal was sent via "kill".
You mention network utilization being "high" when disconnects happen. How do you know that? What measurement are you looking at that tells you it is "high"? That might be a symptom of a networking issue that is at the root of the problem.
There are few directions:
OpenEdge has published this article with links to implementing keep-alive packets:
https://knowledgebase.progress.com/articles/Article/Telnet-connection-times-out-after-15-minutes
Increase the number of "instances" in xinetd.conf, and then restart the service.
Make sure that the database watchdog is up and running: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dmadm/prowdog-command.html
Check the database log file, to find out what happened just before the hangup (https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsins/openedge-database-log-file.html)