ActiveMQ Artemis configure standalone brokers with failover and statically assigned queues - activemq-artemis

I am trying to figure out how to utilize ActiveMQ Artemis to achieve the following topology. I do need to have several producers writing to queues hosted on two standalone Artemis brokers. For the moment every producer creates two connection factories which handle the connections to the 2 brokers and create the corresponding queues.
#Bean
public ConnectionFactory jmsConnectionFactoryBroker1() {
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(brokerUrl_1,username,password);
return connectionFactory;
}
#Bean
public ConnectionFactory jmsConnectionFactoryBroker2() {
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(brokerUrl_2,username,password);
return connectionFactory;
}
My main issue is that I need to know which queue is assigned to which broker and at the same time I need to know that if one broker is down for some reason that I can re-create that queue to the other broker on the fly and avoid losing any further messages. So my approach was to setup broker urls as below
artemis.brokerUrl_1=(tcp://myhost1:61616,tcp://myhost2:61616)?randomize=false
artemis.brokerUrl_2=(tcp://myhost2:61616,tcp://myhost1:61616)?randomize=false
So using a different JmsTemplate for each broker url my intention was that when referring to JmsTemplate
using brokerUrl_1 would create the queues on myhost1, and the same for the corresponding JmsTemplate
for brokerUrl_2.
I would have expected (due to randomize parameter) that each queue would have some kind of static membership to a broker and in the case of a broker's failure there would be some kind of migration by re-creating the queue from scratch to the other broker.
Instead what I notice that almost every time the distribution of queue creation does not happen as perceived but rather randomly since the same queue can appear in either broker which is not a desirable
for my use-case.
How can I approach this case and solve my problem in a way that I can create my queues on a predefined broker and have the fail-safe that if one broker is down the producer will create the same queue to the
other broker and continue?
Note that having shared state between the brokers is not an option

The randomize=false doesn't apply to the Artemis core JMS client. It only applies to the OpenWire JMS client distributed with ActiveMQ 5.x. Which connector is selected from the URL is determined by the connection load-balancing policy as discussed in the documentation. The default connection load-balancing policy is org.apache.activemq.artemis.api.core.client.loadbalance.RoundRobinConnectionLoadBalancingPolicy which will select a random connector from the URL list and then round-robin connections after that. There are other policies available, and if none of them give you the behavior you want then you can potentially implement your own.
That said, it sounds like what you really want/need is 2 pairs of brokers where each pair consists of a live and a backup. That way if the live broker fails then all the clients can fail-over to the backup and you won't have to deal with any of this other complexity of this "fake" fail-over functionality you're trying to implement.
Also, since you're using Spring's JmsTemplate you should be aware of some well-known anti-patterns that it uses which may significantly impact performance in a negative way.

Related

Documentation for HA Strimzi Kafka-Bridge?

We are thinking about using the Strimzi Kafka-Bridge(https://strimzi.io/docs/bridge/latest/#proc-creating-kafka-bridge-consumer-bridge) as HTTP(s) Gateway to an existing Kafka Cluster.
The documentation mentions the creation of consumers using arbitrary names for taking part in a consumer-group. These names can subsequently be used to consume messages, seek or sync offsets,...
The question is: Am I right in assuming the following?
The bridge-consumers seem to be created and maintained just in one Kafka-Bridge instance.
If I want to use more than one bridge because of fault-tolerance-requirements, the name-information about a specific consumer will not be available on the other nodes, since there is no synchronization or common storage between the bridge-nodes.
So if the clients of the kafka-bridge are not sticky, as soon as a it communicates (e.g. because of round-robin handling by a load-balancer) with another node, the consumer-information will not be available and the http(s)-clients must be prepared to reconfigure the consumers on the new communicating node.
The offsets will be lost. Worst case the fetching of messages and syncing their offsets will always happen on different nodes.
Or did I overlook anything?
You are right. The state and the Kafka connections are currently not shared in any way between the bridge instances. The general recommendation is that when using consumers, you should run the bridge only with single replica (and if needed deploy different bridge instances for different consumer groups).

Routing with gRPC microservices and Kubernetes

I have two applications one is a regular Kafka consumer and the other is a gRPC based microservice. Kafka consumer is only responsible for consumption of messages and the business logic resides within the microservice. Also the key for messages within our Kafka topic is null, so Kafka does round-robin assignment of messages to the partitions which distributes the incoming messages evenly to all partitions. At the end of the day I am dealing with non-transactional storage (BigTable) so I have to make sure that there is only one thread responsible for reading, updating and writing a row-key into the storage in order to avoid race-conditions. My gRPC microservice is running within a Kubernetes cluster on multiple pods, how can I make sure that a message object belonging to a particular row-key goes to the same pod within the Kubernetes cluster so that there are no race-conditions?? My microservice is responsible for writing the final output to the BigTable and the microservice is sitting behind a load balancer.
It might not be a solution if you already have a (big) code base, but streaming frameworks like Apache Flink handle this pretty gracefully.
It has an operator keyBy() that does exactly what you want. It will 'sort' the messages by a key defined by you and will guarantee messages with the same key get processed by the same thread.

How to configure Kafka to have consumers and producers recover from failures when replication is disabled?

We have a use case where data loss is acceptable(think 30-50% loss acceptable). In an effort to reduce costs, we want to know if it is possible to configure Kafka with a replication factor of 1 such that consumers and producer can recover from broker failures by simply consuming and producing from and to available partitions.
If this is possible, what are the configurations that need to be set?
There are other broker technologies that inherently behave this way, however, we would like to avoid the introduction of another technology as kafka is already part of our ecosystem.
If you create a new topic via bin/kafka-topics.sh you need to specify parameter --replication-factor; just set it to 1 to disable replication.
For existing topics, you can change the replication factor using bin/kafka/topics.sh using parameter --alter.
For producers and consumers you might need to do some extra exception handling. For example, if you do specify a dedicated partition when you write a record and the broker is not reachable, you might need to take for of this (maybe just skip this write or whatever is appropriate). But there is no specific configuration you need to set for you clients.

Kafka Producer - By default supports Multithreading?

I am newbie to kafka. I have created sample kafka sync producer and consumergroup programs using kafka_2.9.2-0.8.1.1.
So My question is, do I need to add multithreading code to producer (like consumergroup class has) to support huge number of requests? I read producer send method is thread safe.
So kafka producer will take care of multithreading concepts internally or developer has to code explicitly?
Any help would be highly appreciated.
Thanks,
Cdhar
There are two types of producers available with Kafka. (1) SyncProducer (2) AsyncProducer. If you set the producer.type configuration as async it will uses the AsyncProducers. By default it uses the Synchronous producer class.
Once running in async mode it creates a separate AsyncProducer instance per broker.And each of these AsyncProducer instances maintains its own internal background thread for sending the messages. These are called ProducerSendThread.
So there is one thread running per broker and your parallelism is based on the number of brokers available in the cluster. So adding new brokers in the cluster should provide you the flexibilities to increase the level of parallelism while producing data using Kafka.But remember adding a new broker to your cluster should be considered taking other paramaters also into consideration.

Parallel processing of JMS messages?

Is it possible to create a pool of Message Listeners or a Message Driven Beans to process messages from a JMS queue or topic in parallel ?
I am using JBoss and JBoss's JMS
Yes, if the MDB pool size is greater than one, JBoss should create multiple MDBs to process the messages in parallel.
Absolutely. I've done it with JMS queues to create a multi-server pool of listeners in order to process large numbers of transactions. You can use the Competing Consumers pattern. I used a modified one, since we needed to process messages in order within accounts. We used a lease mechanism to allocate servers to account number ranges, providing failover and scalability.
We were using Tibco's JMS provider, but it works with any JMS provider.