Is it recommended to set IdleTimeout as zero in Activemq PooledConnectionFactory? - activemq-artemis

Is it recommended to set IdleTimeout as zero in ActiveMQ PooledConnectionFactory? We use ActiveMQ Artemis 2.17.0 as our JMS broker and the ActiveMQ JMS pool 5.16.2 as connection pool.

The general recommendation with tunable parameters like this is to start with the default and adjust as needed based on your specific use-case.
With an idleTimeout of 0 no connection will ever be considered idle. Typically you wouldn't want this because your pool would never shrink down from its maximum size. Typically a pool is used to make efficient use of the connections and specifically to avoid creating too many connections when load is high. As the application load increases and decreases the pool typically will grow and shrink as well. This prevents the application from overwhelming the back-end service (i.e. the broker in this case) and/or consuming connection resources when they aren't actually needed. If idleTimeout is 0 the pool will never shrink down after application load increases and then decreases again requiring fewer connections. As noted previously, typically this isn't what you'd want, but perhaps this behavior fits your use-case perfectly.
Aside from the configuration, there is a general recommendation to use the Messaging Hub PooledJMS implementation instead of the one directly from the ActiveMQ "Classic" code-base like you're using. This is because ActiveMQ Artemis has full support for JMS 2, and the connection pool implementation from ActiveMQ "Classic" does not. The Messaging Hub PooledJMS implementation was forked from the one in the ActiveMQ "Classic" code-base and modified to support JMS 2 as well as a few other updates.

Related

WildFly embedded ActiveMQ Artemis: Difference between queue and jms-queue

What is the difference between queue and jms-queue declaration in the server configuration in the activemq-messaging subsystem?
Could the queue be used with a MessageDriven bean instead of a jms-queue?
I'm using Wildlfy 19 and Artemis 2.12.0
ActiveMQ Artemis supports the JMS API, but it also supports industry standard protocols like AMQP, STOMP, OpenWire, & MQTT. Therefore the broker's underlying address model is not JMS-specific but rather more generic & flexible in order to support numerous different use-cases.
The bare queue refers to the underlying "core" queue from ActiveMQ Artemis. I believe WildFly exposes this low-level component to support unforeseen use-cases. The queue configuration gives control over the address and routing-type which are the other two main components of the ActiveMQ Artemis address model.
The jms-queue refers to the traditional JMS-based resource which MDBs and other JMS-based application components will use in a Java/Jakarta EE application server. It gives you control of JNDI entries which queue does not. It serves as a kind of familiar "wrapper" around the core queue. That's why there's so much overlap with the attributes and operations between the two.
There's really no reason to use queue in lieu of jms-queue unless you absolutely must. A jms-queue is more straight-forward to configure and understand for almost all use-cases. The only reason to use a queue is if you needed to control the address and routing-type in a way that isn't allowed by jms-queue. This is highly unlikely for JMS applications.
It is possible, for example, to send messages to or consumer a message from a queue, but since queue lacks a way to configure JNDI bindings the JMS client would have to instantiate the queue directly using javax.jms.Session.createQueue(String). However, this method is discouraged as it reduces the portability of the application.

Mirror vs. HA Data Replication for disaster recovery

I'm looking at the options in ActiveMQ Artemis for data recovery if we lose an entire data centre. We have two data centres, one on the east coast and one on the west coast.
From the documentation and forums I've found four options:
Disk based methods:
Block based replication of the data directory between the sites, run Artemis on one site (using Ciphy or DRBD with protocol A). In the event of disaster (or fail over test), stop Artemis on the dead site, and start it on the live site.
The same thing but with both Artemis servers active, using an ha-policy to indicate the master and the slave using a shared store.
Network replication:
Like number 2, but with data replication enabled in Artemis, so Artemis handles the replication.
Mirror broker connections.
Our IT team uses / is familiar with MySQL replication, NFS, and rsync for our other services. We are currently handling JMS with a JBoss 4 server replicated over MySQL.
My reaction from reading the documentation is that high availability data replication is the way to go, but is there trade offs I'm not seeing. The only one that mentions DR and cross site is the mirror broker connection, but on the surface it looks like a more difficult to manage version of the same thing?
Our constraints are that we need high performance on the live cluster (on the order of 10s of thousands of messages per second, all small)
We can afford to lose messages (as few as possible preferably) in an emergency fail over. We should not lose messages in a controlled fail over.
We do not want clients in site A connecting to Artemis in site B - we will enable clients on site B in the event of a fail over.
The first thing to note is that the high availability functionality (both shared-store and replication - options #2 & #3) configured via ha-policy is designed for use in a local data center with high speed, low latency network connections. It is not designed for disaster recovery.
The problem specifically with network-based data replication for you is that replication is synchronous which means there's a very good chance it will negatively impact performance since every durable message will have to be sent across the country from one data center to the other. Also, if the replicating broker fails then clients will automatically fail-over to the backup in the other data center.
Using a solution based on block-storage (e.g. Ceph or DRDB) is viable, but it's really an independent thing outside the control of ActiveMQ Artemis.
The mirror broker connection was designed with the disaster recovery use-case in mind. It is asynchronous so it won't have nearly the performance impact of replication, and if the mirroring broker fails clients will not automatically fail-over to the mirror.

How to handle back pressure with Kafka REST Proxy

I am creating a service that sends lots of data to kafka-rest-proxy. I am only sending data (producing) to kafka. What I'm finding is that kafka-rest-proxy is easily overwhelmed and runs out of java heap space. I've allocated additional resources, and even horizontally scaled out the number of hosts running kafka-rest-proxy, yet I still encounter dropped connections and memory issues.
I'm not familiar with the internals of kafka-rest-proxy, but my hunch is that it's buffering the records and sending them to Kafka asynchronously. If that is the case then what mechanism does it have to control back pressure? Is there a way to configure it such that it writes records to Kafka synchronously?
Kafka REST Proxy exposes all of the functionality of the Java producers, consumer's command-line tools. Rest Proxy doesn't need any back pressure concept.
To be more specific, Kafka is capable of delivering messages over the network at an alarmingly fast rate.
You need to scale the brokers as per the rate you are producing and consuming the data.

How to determine how many actors the default play 2.7 will create when responding to requests?

I want to understand how many actors the default settings (execution context) that play ships with will create when responding to requests?
Once this maximum number of actors is reached, what will happen to requests that don't get served? Will they block up to a certain point?
Do browsers timeout after x seconds or is this a tcpip timeout?
Some information that may help you understand things a little more. You should probably think in the context of threads though, not actors, for the purpose of your question.
The default backend is Akka HTTP (previous versions - prior to 2.6 I think - shipped with Netty by default, which is still available in later versions as a configurable alternative backend to Akka HTTP).
Play's default execution context is configured to pool 1 thread per processor core. The docs state processor but more specifically it is per core. The assumption is that you are building your application in a purely asynchronous and non-blocking fashion though - core tenets of Play's architecture. If you need to do blocking work (notably synchronous IO) then you explore the concept of having a custom pool where you control the number of threads available, and/or multiple pools which provides you with a way to isolate your blocking work from your non-blocking etc - please refer to the docs.
If your thread pool(s) is/are exhausted then subsequent requests will stack up. Default timeout configuration settings can be found by consulting the play docs, or more specifically, the Akka HTTP config.

Is there a performance difference between pooling connections or channels in rabbitmq?

I'm a newbie with Rabbitmq(and programming) so sorry in advance if this is obvious. I am creating a pool to share between threads that are working on a queue but I'm not sure if I should use connections or channels in the pool.
I know I need channels to do the actual work but is there a performance benefit of having one channel per connection(in terms of more throughput from the queue)? or am I better off just using a single connection per application and pool many channels?
note: because I'm pooling the resources the initial cost is not a factor, as I know connections are more expensive than channels. I'm more interested in throughput.
I have found this on the rabbitmq website it is near the bottom so I have quoted the relevant part below.
The tl;dr version is that you should have 1 connection per application and 1 channel per thread.
Connections
AMQP connections are typically long-lived. AMQP is an application
level protocol that uses TCP for reliable delivery. AMQP connections
use authentication and can be protected using TLS (SSL). When an
application no longer needs to be connected to an AMQP broker, it
should gracefully close the AMQP connection instead of abruptly
closing the underlying TCP connection.
Channels
Some applications need multiple connections to an AMQP broker.
However, it is undesirable to keep many TCP connections open at the
same time because doing so consumes system resources and makes it more
difficult to configure firewalls. AMQP 0-9-1 connections are
multiplexed with channels that can be thought of as "lightweight
connections that share a single TCP connection".
For applications that use multiple threads/processes for processing,
it is very common to open a new channel per thread/process and not
share channels between them.
Communication on a particular channel is completely separate from
communication on another channel, therefore every AMQP method also
carries a channel number that clients use to figure out which channel
the method is for (and thus, which event handler needs to be invoked,
for example).
It is advised that there is 1 channel per thread, even though they are thread safe, so you could have multiple threads sending through one channel. In terms of your application I would suggest that you stick with 1 channel per thread though.
Additionally it is advised to only have 1 consumer per channel.
These are only guidelines so you will have to do some testing to see what works best for you.
This thread has some insights here and here.
Despite all these guidelines this post suggests that it will most likely not affect performance by having multiple connections. Though it is not specific whether it is talking about client side or server(rabbitmq) side. With the one point that it will of course use more systems resources with more connections. If this is not a problem and you wish to have more throughput it may indeed be better to have multiple connections as this post suggests multiple connections will allow you more throughput. The reason seems to be that even if there are multiple channels only one message goes through the connection at one time. Therefore a large message will block the whole connection or many unimportant messages on one channel may block an important message on the same connection but a different channel. Again resources are an issue. If you are using up all the bandwidth with one connection then adding an additional connection will have no increase performance over having two channels on the one connection. Also each connection will use more memory, cpu and filehandles, but that may well not be a concern though might be an issue when scaling.
In addition to the accepted answer:
If you have a cluster of RabbitMQ nodes with either a load-balancer in front, or a short-lived DNS (making it possible to connect to a different rabbit node each time), then a single, long-lived connection would mean that one application node works exclusively with a single RabbitMQ node. This may lead to one RabbitMQ node being more heavily utilized than the others.
The other concern mentioned above is that the publishing and consuming are blocking operations, which leads to queueing messages. Having more connections will ensure that 1. processing time for each messages doesn't block other messages 2. big messages aren't blocking other messages.
That's why it's worth considering having a small connection pool (having in mind the resource concerns raised above)
The "one channel per thread" might be a safe assumption (I say might as I have not made any research by myself and I have no reason to doubt the documentation :) ) but beware that there is a case where this breaks:
If you you use RPC with RabbitMQ Direct reply-to then you cannot reuse the same channel to consume for another RPC request. I asked for details about that in the google user group and the answer I got from Michael Klishin (who seems to be actively involved in RabbitMQ development) was that
Direct Reply to is not meant to be used with channel sharing either way.
I've email Pivotal to update their documentation to explain how amq.rabbitmq.reply-to is working under the hood and I'm still waiting for an answer (or an update).
So if you want to stick to "one channel per thread" beware as this will not work good with Direct reply-to.