ActiveMQ Artemis: How to troubleshoot blocked acceptor? - activemq-artemis

We occasionally experience blocked Artemis (v2.19.1) TLS acceptors. Blocked means that a client can establish a TCP connection but no further data exchange occurs. The TLS handshake never starts. No helpful information in the Artemis logs. The only workaround currently is to restart Artemis.
Affected is the acceptor artemis-tls only. The acceptors stomp-tls and mqtt-tls continue to work, but do not have any load.
This is our acceptor configuration:
<acceptors>
<acceptor name="artemis">tcp://127.0.0.1:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;connectionsAllowed=10000;connectionsAllowed=10000</acceptor>
<acceptor name="artemis-tls">tcp://0.0.0.0:61617?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;sslEnabled=true;keyStorePath=/var/lib/artemis/certs/keystore.jks;keyStorePassword=${keyStorePassword};enabledProtocols=TLSv1.2,TLSv1.3</acceptor>
<acceptor name="stomp-tls">tcp://0.0.0.0:61612?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true;anycastPrefix=/queue/;multicastPrefix=/topic/;sslEnabled=true;keyStorePath=/var/lib/artemis/certs/keystore.jks;keyStorePassword=${keyStorePassword};enabledProtocols=TLSv1.2,TLSv1.3</acceptor>
<acceptor name="mqtt-tls">tcp://0.0.0.0:8883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true;sslEnabled=true;keyStorePath=/var/lib/artemis/certs/keystore.jks;keyStorePassword=${keyStorePassword};enabledProtocols=TLSv1.2,TLSv1.3</acceptor>
</acceptors>
And here’s the log config:
# Root logger level
logger.level=INFO
# ActiveMQ Artemis logger levels
logger.org.apache.activemq.artemis.core.server.level=INFO
logger.org.apache.activemq.artemis.journal.level=INFO
logger.org.apache.activemq.artemis.utils.level=INFO
# if you have issues with CriticalAnalyzer, setting this as TRACE would give you extra troubleshooting information.
# but do not use it regularly as it would incur in some extra CPU usage for this diagnostic.
logger.org.apache.activemq.artemis.utils.critical.level=INFO
logger.org.apache.activemq.artemis.jms.level=INFO
logger.org.apache.activemq.artemis.integration.bootstrap.level=INFO
logger.org.eclipse.jetty.level=WARN
Any hint on how to get more information on what is going on with the Acceptor is highly appreciated. Unfortunately, there is nothing helpful in the docs.

ActiveMQ Artemis delegates low-level TCP work to Netty which in turn delegates to the JVM and ultimately to the operating system. There's no logging in the broker to get these low-level details.
This sounds like some kind of environmental issue. I would recommend using the SSL debug logging provided by the JVM as #ehsavoie suggested in the comments. For example, set this on the broker's command-line:
-Djavax.net.debug=ssl,handshake
You can do this by editing artemis.profile in the broker instance's etc directory. You can find more details about this logging here.
You might also try a packet capture (e.g. via tcpdump, WireShark, etc.) to see exactly what is happening on a TCP level.

Metrics indicating a blocked thread in Artemis and a closer look at it pointed us in the right direction: The acceptor was blocked by our custom login module, which by itself was blocked by a REST call to an external system. We have now introduced some safeguarding against external influences. Looks like we are stable again.
Addendum: Stack traces, see comments from #JustinBertram

Related

How to edit http connection in wildfly 8.2.1 on Linux machine

I have deployed a simple Servlet web application on Wildfly 8.2.1 on RHEL 6.9. This application just takes post request and respond with 200 OK.
Now when the client(java client using apache-common-http client) is posting data on the web application. The web application is accepting the request but many of the requests are failing also with ERROR "Caused by java.net.ConnectException: Connection timed out (Connection timed out)" at the client side.
Here my assumption is, Wildfly has some default value for max Http connection which can be opened at any point in time. if further requests are coming which require opening a new connection, web server is rejecting them all.
could anyone here please help me with below question:
How can we check live open HTTP connections in RHEL 6.9. I mean command in RHEL to check how many connection open on port 8080?
How can we tweak the default value of the HTTP connection in wildfly?
Does HTTP connection and max thread count linked with each other. If So, Please let me know how they should be updated in wildfly configuration(standalone.xml).
How many requests can be kept in the queue by Wildfly? what will be happening to the request coming to
wildfly server if the queue is full.
NOTE: It is a kind of load testing for the webserver where traffic is high, not sure about exact value but it's high.
You're getting into some system administration topics but I'll answer what I can. First and foremost - Wildfly 8.2.1 is part of the very first release of Wildfly and I'd strongly encourage upgrading to a newer version.
To check the number of connections in a Unix-like environment you'll want to use the netstat command line. In your case, something like:
netstat -na | grep 8080 | grep EST
This will show you all the connections that are ESTABLISHED to port 8080. That would give you an snapshot of the number of connections. Pipe that to wc to get a count.
Next, finding documentation on Wildfly 8.2.1 is a bit challenging now but Wildfly 8 uses Undertow for the socket IO. That in turn uses XNIO. I found a thread that goes into some detail about configuring the I/O subsystem. Note that Wildfly 8.2.1 uses Undertow 1.1.8 which isn't documented anywhere I could find.
For your last two questions I believe they're related to the second one - the XNIO configuration includes configuration like:
<subsystem xmlns="urn:jboss:domain:io:1.0">
<worker name="default" io-threads="5" task-max-threads="50"/>
<buffer-pool name="default" buffer-size="16384" buffers-per-slice="128"/>
</subsystem>
but you'll need to dig deeper into the docs for details.
In Wildfly 19.1.0.Final the configuration looks similar to the code above other than the version is now 3.0.

HornetQ local connection never timing out

My application, running in a JBOSS standalone env, relies on a HornetQ (v2.2.5.Final) middleware to exchange messages between parts of my application in a local environment - not over the network.
The default TTL (time-to-live) value for the connection is 60000ms, I am thinking of changing that to -1 since, from an operative point of view, I am looking forward to keep sending messages through such connection from time to time (not known in advance). Also, that would prevent issues like jms queue connection failure.
The question is: what are the issues with never timing out a connection on the server side, in such context? Is that a good choice? If not, is there a strategy that is suited for such situation?
The latest versions of HornetQ automatically disable connection checking for in-vm connections so there shouldn't be any issues if you configure this manually.

hornetq fails when we change system time

I have an issue and I hope you can help me a bit.
I have to implement fast forwarding time, because I need to test something. I've wrote a python script which increment the system time with 5 seconds for every 1 real second. (5 times faster).
Then my jboss fails with some hornetq timeouts.
Do you have any ideas how I can fix this?
03/09/18 09:18:00,107 WARN
[org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (hornetq-
failure-check-thread) Connection failure has been detected: Did not
receive data from invm:0. It is likely the client has exited or crashed
without closing its connection, or the network between the server and
client has failed. You also might have configured connection-ttl and
client-failure-check-period incorrectly. Please check user manual for
more information. The connection will now be closed. [code=3]
The underlying issue is that changing the time breaks the connection-failure-detection algorithm used by the broker. The broker thinks it isn't receiving "ping" packets from clients at the proper time because you're forcing time to pass at 5x the normal rate. There is no way to fix this for remote clients aside from disabling or extending the connection TTL. However, for in-vm connections you could apply the fix from https://issues.jboss.org/browse/HORNETQ-1314 (which is not resolved in the version of HornetQ you are using) to the branch of HornetQ you're currently using and rebuild. If you don't want to rebuild you could upgrade to a version of JBoss AS (or Wildfly) which contains this fix.

Behavior of syslog handler in Wildfly 8

I want to use the syslog handler in Wildfly 8 to send the application logs to logstash (and I know this may not be best practice at the moment).
Does anyone know how the syslog handler acts if the syslog/logstash server is not available?
Is there any buffering (memory, files), does it consume endless resources in a reconnect loop, in short: Does anyone have experience with the syslog handler?
Thanks,
Michael
The syslog-handler will not buffer messages if the socket can't connect. If you use UDP then it will attempt to connect each time. Using TCP it depends on the version of the logmanager. I think with the version in WildFly 8 it will attempt to reconnect, but you'll lose any messages sent while the syslog server is down.

Mule ESB CE 3.5.0 TCP Reconnection Strategies

I am working with Mule ESB CE 3.5.0 and am seeing what I believe is a resource leak on the TCP connections. I am hooking up VisualVM and checking the memory. I see that it increases over time without ever decreasing.
My scenario is that I have messages being sent to Mule, Mule does its thing, and then dispatches to a remote TCP endpoint (on the same box, usually). What I did was not start up the program that would receive a message from Mule's TCP outbound endpoint. So there is nothing listening for Mule's dispatched message.
I configure my TCP connectors as following:
<tcp:connector name="TcpConnector" keepAlive="true" keepSendSocketOpen="true" sendTcpNoDelay="true" reuseAddress="true">
<reconnect-forever frequency="2000" />
<tcp:xml-protocol />
</tcp:connector>
<tcp:endpoint name="TcpEndpoint1" responseTimeout="3000" connector-ref="TcpConnector" host="${myHost}" port="${myPort}" exchange-pattern="one-way" />
My questions are:
When a flow fails to send to the TCP outbound endpoint, what happens to the message? Is the message kept in memory somewhere and once the TCP connector establishes connections to the remote endpoint, do all the accumulated messages burst through and get dispatched?
When the reconnection strategy is blocking, I assume it is a dispatcher thread that tries to establish the connection. If we have more message to dispatch, then are there more dispatcher threads that are tied up to attempting the reconnection? What happens if it is non-blocking?
Thanks!
Edit:
If I also understand the threading documentation correctly, does that mean that if I have the default threading profile set to poolExhaustedAction="RUN", and all the dispatcher threads block waiting for a connection, eventually the flow threads, and then the receiver threads will block on trying to establish the connection. When the remote application begins listening again, all the backlogged messages from the blocked threads will burst through.
So if the flow receives transient data, it should be configured to have non-blocking reconnection and since it is acceptable to throw away the messages (in my use case), then we can make do with the exception that will be thrown.
I would point you to the documentation:
Non-Blocking Reconnection
By default, a reconnection strategy will block Mule application
message processing until it is able to connect/reconnect. When you
enable non-blocking reconnection, the application does not need to
wait for all endpoints to re-connect before it restarts. Furthermore,
if a connection is lost, the reconnection takes place on a thread
separate from the application thread. Note that such behavior may or
may not be desirable, depending on your application needs.
On blocking reconnection strategies what you are going to get is that the dispatcher will get blocked, waining for an available connection. The messages are not technically kept anywhere, their flow is just stopped.
Regarding the second question it changes between transport and transport. In this very special case, given that tcp is a connection per request transport, different dispatchers will try to get a different socket form the pool of connections.
In case of non-blocking strategies you will get an exception. You can probably test it easily.