I am working with Mule ESB CE 3.5.0 and am seeing what I believe is a resource leak on the TCP connections. I am hooking up VisualVM and checking the memory. I see that it increases over time without ever decreasing.
My scenario is that I have messages being sent to Mule, Mule does its thing, and then dispatches to a remote TCP endpoint (on the same box, usually). What I did was not start up the program that would receive a message from Mule's TCP outbound endpoint. So there is nothing listening for Mule's dispatched message.
I configure my TCP connectors as following:
<tcp:connector name="TcpConnector" keepAlive="true" keepSendSocketOpen="true" sendTcpNoDelay="true" reuseAddress="true">
<reconnect-forever frequency="2000" />
<tcp:xml-protocol />
</tcp:connector>
<tcp:endpoint name="TcpEndpoint1" responseTimeout="3000" connector-ref="TcpConnector" host="${myHost}" port="${myPort}" exchange-pattern="one-way" />
My questions are:
When a flow fails to send to the TCP outbound endpoint, what happens to the message? Is the message kept in memory somewhere and once the TCP connector establishes connections to the remote endpoint, do all the accumulated messages burst through and get dispatched?
When the reconnection strategy is blocking, I assume it is a dispatcher thread that tries to establish the connection. If we have more message to dispatch, then are there more dispatcher threads that are tied up to attempting the reconnection? What happens if it is non-blocking?
Thanks!
Edit:
If I also understand the threading documentation correctly, does that mean that if I have the default threading profile set to poolExhaustedAction="RUN", and all the dispatcher threads block waiting for a connection, eventually the flow threads, and then the receiver threads will block on trying to establish the connection. When the remote application begins listening again, all the backlogged messages from the blocked threads will burst through.
So if the flow receives transient data, it should be configured to have non-blocking reconnection and since it is acceptable to throw away the messages (in my use case), then we can make do with the exception that will be thrown.
I would point you to the documentation:
Non-Blocking Reconnection
By default, a reconnection strategy will block Mule application
message processing until it is able to connect/reconnect. When you
enable non-blocking reconnection, the application does not need to
wait for all endpoints to re-connect before it restarts. Furthermore,
if a connection is lost, the reconnection takes place on a thread
separate from the application thread. Note that such behavior may or
may not be desirable, depending on your application needs.
On blocking reconnection strategies what you are going to get is that the dispatcher will get blocked, waining for an available connection. The messages are not technically kept anywhere, their flow is just stopped.
Regarding the second question it changes between transport and transport. In this very special case, given that tcp is a connection per request transport, different dispatchers will try to get a different socket form the pool of connections.
In case of non-blocking strategies you will get an exception. You can probably test it easily.
Related
I am using nghttp2 to implement a RESTful API server. I have defined two GET APIs:
/api/ping and /api/wait. While the response to the former is sent immediately, the server does some processing before responding to the latter. I allotted 4 threads to the server.
From a client (also implemented using nghttp2), I made a connection to the server and made the API calls one by one, /api/wait first followed by /api/ping. I observed using Wireshark that the two GET requests are sent over two different TCP packets. However, until the server completes processing of the /api/wait, it does not not process /api/ping, although it has other threads available.
I established two TCP connections from the client and made the two API calls on the different connections and the server processed those in parallel.
Does this mean that nghttp2 processes one TCP connection exclusively on one thread and requests from one TCP connection are processed sequentially by design? Is there any setting in nghttp2 to circumvent this? This may be a good feature for a web application (processing requests sequentially) but not an API server where APIs are independent of each other.
I have the following queries:
1) Does TCP guarantee delivery of packets and thus is thus application level re-transmission ever required if transport protocol used is TCP. Lets say I have established a TCP connection between a client and server, and server sends a message to the client. However the client goes offline and comes back only after say 10 hours, so will TCP stack handle re-transmission and delivering message to the client or will the application running on the server need to handle it?
2) Related to the above question, is application level ACK needed if transport protocol is TCP. One reason for application ACK would be that without it, the application would not know when the remote end received the message. Is there any reason other than that? Meaning is the delivery of the message itself guaranteed?
Does TCP guarantee delivery of packets and thus is thus application level re-transmission ever required if transport protocol used is TCP
TCP guarantees delivery of message stream bytes to the TCP layer on the other end of the TCP connection. So an application shouldn't have to bother with the nuances of retransmission. However, read the rest of my answer before taking that as an absolute.
However the client goes offline and comes back only after say 10 hours, so will TCP stack handle re-transmission and delivering message to the client or will the application running on the server need to handle it?
No, not really. Even though TCP has some degree of retry logic for individual TCP packets, it can not perform reconnections if the remote endpoint is disconnected. In other words, it will eventually "time out" waiting to get a TCP ACK from the remote side and do a few retries. But will eventually give up and notify the application through the socket interface that the remote endpoint connection is in a dead or closed state. Typical pattern is that when a client application detects that it lost the socket connection to the server, it either reports an error to the user interface of the application or retries the connection. Either way, it's application level decision on how to handle a failed TCP connection.
is application level ACK needed if transport protocol is TCP
Yes, absolutely. Most client-server protocols has some notion of a request/response pair of messages. A TCP socket can only indicate to the application if data "sent" by the application is successfully queued to the kernel's network stack. It provides no guarantees that the application on top of the socket on the remote end actually "got it" or "processed it". Your protocol on top of TCP should provide some sort of response indication when ever a message is processed. Use HTTP as a good example here. Imagine if an application would send an HTTP POST message to the server, but there was not acknowledgement (e.g. 200 OK) from the server. How would the client know the server processed it?
In a world of Network Address Translators (NATs) and proxy servers, TCP connections that are idle (no data between each other) can fail as the NAT or proxy closes the connection on behalf of the actual endpoint because it perceives a lack of data being sent. The solution is to have some sort of periodic "ping" and "pong" protocol by which the applications can keep the TCP connection alive in the absences of having no data to send.
I am using Camel Netty for full duplex communication over TCP socket.
My application is using the following parameters in the route.
<inOut uri="netty:tcp://{{IP-Port}}?
textline=true&sync=true&decoderMaxLineLength=1000000&autoAppendDelimiter=false&disconnect=false&producerPoolMaxActive=-1&producerPoolMinEvictableIdle=120000&keepAlive=false&noReplyLogLevel=INFO&serverExceptionCaughtLogLevel=INFO&requestTimeout=2500" />
The netty component above receives requests from a preceding wiretap in the flow.
During the day after about 8-10 hours, some of the connections show as ESTABLISHED state but will not be serving any requests. Even at the server end, these connections show as ESTABLISHED but there is no activity for hours.
When we looked at one connection closely, found that the last request attempted (not been received by server) was writing body to endpoint and got an exception org.apache.camel.processor.DefaultErrorHandler - Failed delivery for (MessageId: xxxxx on ExchangeId: ID-xxxx). On delivery attempt: 0
Since netty is being called from wiretap, after this last request, succeeding requests are not even entertained and they are blocked in wiretap itself..
I am collecting tcpdump later tonight for more details though.
Questions:
1. Why is producerPoolMinEvictable NOT kicking in to clear such stale connections?
2. How do we clear these stale connections automatically without having to
bounce the application?
3. Is there a problem using wiretap?
Appreciate suggestions to resolve this issue. Please ask for any more details needed to answer and I shall be happy to share.
Note:
camel-netty
2.11.2-
ELB will auto close connection for 60 secs idling, with TCP connection switch to CLOSE_WAIT state
however, celery doesn't get noticed and keep publish task message
message will be kept in send buffer
when buffer is full, celery publishing call will be blocked.
Possible damages:
Message in send buffer will be lost
The blocking publishing call will be very harmful to single thread ioloop frameworks. e.g. Tornado
Solutions
BROKER_TRANSPORT_OPTIONS = {'confirm_publish': True} to make celery wait for ack for each published message, if ack not receive, it will re-build connection and send again. Only apply to py-amqp (ref), performance downgrades.
Celery-RabbitMQ Heartbeat to keep connection active and avoiding ELB's auto close connection. Add additional network overhead, heartbeat might not deliver to both end in bad network environment and cause this solution not working.
Is there a reason why I should use application level heartbeating instead of TCP keepalives to detect stale connections, given that only Windows and Linux machines are involved in our setup?
It seems that the TCP keepalive parameters can't be set on a per-socket basis on Windows or OSX, that's why.
Edit: All parameters except the number of keepalive retransmissions can in fact be set on Windows (2000 onwards) too: http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220%28v=vs.85%29.aspx
I was trying to do this with zeromq, but it just seems that zeromq does not support this on Windows?
From John Jefferies response : ZMQ Pattern Dealer/Router HeartBeating
"Heartbeating isn't necessary to keep the connection alive (there is a ZMQ_TCP_KEEPALIVE socket option for TCP sockets). Instead, heartbeating is required for both sides to know that the other side is still active. If either side does detect that the other is inactive, it can take alternative action."
TCP keepalives serve an entirely different function from application level heartbeating. A keepalive does just that, it keeps the TCP session active rather than allow it to time out after long periods of silence. This is important and good, and (if appropriate) you should use it in your application. But a TCP session dying due to inactivity is only one way that the connection can be severed between a pair of ZMQ sockets. One endpoint could lose power for 90 minutes and be offline, TCP keepalives wouldn't do squat for you in that scenario.
Application level heartbeating is not intended to keep the TCP session active, expecting you to rely on keepalives for that function if possible. Heartbeating is there to tell your application that the connection is in fact still active and the peer socket is still functioning properly. This would tell you that your peer is unavailable so you can behave appropriately, by caching messages, throwing an exception, sending an alert, etc etc etc.
In short:
a TCP keepalive is intended to keep the connection alive (but doesn't protect against all disconnection scenarios)
an app-level heartbeat is intended to tell your application if the connection is alive