Parallel processing of JMS messages? - jboss

Is it possible to create a pool of Message Listeners or a Message Driven Beans to process messages from a JMS queue or topic in parallel ?
I am using JBoss and JBoss's JMS

Yes, if the MDB pool size is greater than one, JBoss should create multiple MDBs to process the messages in parallel.

Absolutely. I've done it with JMS queues to create a multi-server pool of listeners in order to process large numbers of transactions. You can use the Competing Consumers pattern. I used a modified one, since we needed to process messages in order within accounts. We used a lease mechanism to allocate servers to account number ranges, providing failover and scalability.
We were using Tibco's JMS provider, but it works with any JMS provider.

Related

ActiveMQ Artemis configure standalone brokers with failover and statically assigned queues

I am trying to figure out how to utilize ActiveMQ Artemis to achieve the following topology. I do need to have several producers writing to queues hosted on two standalone Artemis brokers. For the moment every producer creates two connection factories which handle the connections to the 2 brokers and create the corresponding queues.
#Bean
public ConnectionFactory jmsConnectionFactoryBroker1() {
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(brokerUrl_1,username,password);
return connectionFactory;
}
#Bean
public ConnectionFactory jmsConnectionFactoryBroker2() {
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(brokerUrl_2,username,password);
return connectionFactory;
}
My main issue is that I need to know which queue is assigned to which broker and at the same time I need to know that if one broker is down for some reason that I can re-create that queue to the other broker on the fly and avoid losing any further messages. So my approach was to setup broker urls as below
artemis.brokerUrl_1=(tcp://myhost1:61616,tcp://myhost2:61616)?randomize=false
artemis.brokerUrl_2=(tcp://myhost2:61616,tcp://myhost1:61616)?randomize=false
So using a different JmsTemplate for each broker url my intention was that when referring to JmsTemplate
using brokerUrl_1 would create the queues on myhost1, and the same for the corresponding JmsTemplate
for brokerUrl_2.
I would have expected (due to randomize parameter) that each queue would have some kind of static membership to a broker and in the case of a broker's failure there would be some kind of migration by re-creating the queue from scratch to the other broker.
Instead what I notice that almost every time the distribution of queue creation does not happen as perceived but rather randomly since the same queue can appear in either broker which is not a desirable
for my use-case.
How can I approach this case and solve my problem in a way that I can create my queues on a predefined broker and have the fail-safe that if one broker is down the producer will create the same queue to the
other broker and continue?
Note that having shared state between the brokers is not an option
The randomize=false doesn't apply to the Artemis core JMS client. It only applies to the OpenWire JMS client distributed with ActiveMQ 5.x. Which connector is selected from the URL is determined by the connection load-balancing policy as discussed in the documentation. The default connection load-balancing policy is org.apache.activemq.artemis.api.core.client.loadbalance.RoundRobinConnectionLoadBalancingPolicy which will select a random connector from the URL list and then round-robin connections after that. There are other policies available, and if none of them give you the behavior you want then you can potentially implement your own.
That said, it sounds like what you really want/need is 2 pairs of brokers where each pair consists of a live and a backup. That way if the live broker fails then all the clients can fail-over to the backup and you won't have to deal with any of this other complexity of this "fake" fail-over functionality you're trying to implement.
Also, since you're using Spring's JmsTemplate you should be aware of some well-known anti-patterns that it uses which may significantly impact performance in a negative way.

Is Kafka a message queue and can Kafka be used as the database?

Some places mentioned Kafka is the publish-subscribe messaging. Other sources mentioned Kafka is the Message Queue. May I ask the differences between those and can Kakfa be used as the database?
There are 2 patterns named Publish-Subscribe and Message Queue. There are some places discussed the differences. here
Kafka especially supports both of these 2 patterns. For the publish-subscribe pattern, Kafka has publisher/subscriber which supported this pattern. The publisher sends messages to one topic and the subscriber can subscribes and receives messages on that one. For the queueing pattern, Kafka has a concept named Consumer Group. Within the same consumer group, all consumers will share jobs hence balancing the workload.
Because of the flexible design from the start, Kafka is broadly used for many software patterns while designing the system.
Personally, I would not call Kafka itself a database but you can use Kafka as the storage, especially through some mechanisms such as the log compaction. Ref1 Ref2
Kafka is a storage at base like a database but without indexes, where every query is a full scan of your data. Kafka it store data in files that can not be modified. Ex if you use event sourcing you can save all event of your system in Kafka and reprocess all events when your system have a bug.
Imagine that Kafka can split a very huge file(10TB or more) on multiple server and provide a way to read that file in a distributed manner using partitions( more partition you have, more application can read in parallel).
Because its a storage, Kafka can also be used as a message queue or as a publish-subscribe system.

Routing with gRPC microservices and Kubernetes

I have two applications one is a regular Kafka consumer and the other is a gRPC based microservice. Kafka consumer is only responsible for consumption of messages and the business logic resides within the microservice. Also the key for messages within our Kafka topic is null, so Kafka does round-robin assignment of messages to the partitions which distributes the incoming messages evenly to all partitions. At the end of the day I am dealing with non-transactional storage (BigTable) so I have to make sure that there is only one thread responsible for reading, updating and writing a row-key into the storage in order to avoid race-conditions. My gRPC microservice is running within a Kubernetes cluster on multiple pods, how can I make sure that a message object belonging to a particular row-key goes to the same pod within the Kubernetes cluster so that there are no race-conditions?? My microservice is responsible for writing the final output to the BigTable and the microservice is sitting behind a load balancer.
It might not be a solution if you already have a (big) code base, but streaming frameworks like Apache Flink handle this pretty gracefully.
It has an operator keyBy() that does exactly what you want. It will 'sort' the messages by a key defined by you and will guarantee messages with the same key get processed by the same thread.

Is kafka consumer sequential or parallel?

In my application, there are multiple enterprises. each enterprise login and do some action like upload the data, then Kafka producer takes the data and sends to the topic. Another side Kafka consumer consumes data from the topic and performs business logic. and persists into the database.
In this case, everything is perfect when a single enterprise login. but when multiple enterprise logins then Kafka consuming in sequentially. i.e.,
how can I make the process parallel? on multiple client requests.
thanks in advance.
As mentioned in previous Answers you can use multiple partitions .
Another option is you get advantage of threading(Threadpoolexecutor) so follow will be like :
receive message -> create parallel thread to do the required logic --> ack message .
Please ensure you have throttling (using thread pool executors) application perforamance .
If that topic only has one partition, it's sequential on the consumer side. Multiple producers for one partition has no guarantees on ordering.
Consumers and producers will batch messages and process them in chunks.
Another side Kafka consumer consumes data from the topic and performs business logic. and persists into the database.
I suggest not using a regular consumer for this. Please research Kafka Connect and see if your database is supported

How many messages can a queue hold?

In Celery, what is the upper bound limit of the number of messages in a queue?
How many messages can wait in a queue in order to be prefetched/consume by a worker?
Queue length depends from broker(and message length). For example, if you are using RabbitMQ as broker, you can expect millions of messages(I saw hundreds of thousands in practice). You can make simple load testing using RabbitMQ management plugin(monitor resources).
This thread can be helpful.