activemq constantly retrying error messages and not picking up new messages - queue

I have an activemq instance set up with tomcat for background message processing. It is set up to retry failed messages every 10 minutes for a retry period.
Now some dirty data has entered the system because of which the messages are failing. This is ok and can be fixed in the future. However, the problem is that none of the new correct incoming messages are getting processed and the error messages are constantly getting retried.
Any tips on what might be the issue, or how the priority is set? I haven't controlled the priority of the messages manually.
Thanks for your help.
-Pulkit
EDIT : I was able to solve the problem. The issue was that by the time all the dirty messages were handled, it was time for them to be retried. Thus none of the new messages were being consumed by the queue.
A dirty message was basically a message that was throwing an exception out due to some dirty data in the system. the redelivery settings was to do redeliveries every 10 mins for 1 day.
maximumRedeliveries=144
redeliveryDelayInMillis=600000
acknowledge.mode=transacted

ActiveMQ determines redelivery for a consumer based on the configuration of the RedeliveryPolicy that's assigned the ActiveMQConnectionFactory. Local redelivery halts new message dispatch until the rollbed back transaction messages are successfully redelivered so if you have a message that's causing you some sort of error such that you are throwing an exception or rolling back the transaction then it will get redelivered up to the max re-deliveries setting in the policy. Since your question doesn't provide much information on your setup and what you consider an error message I can't really direct you to a solution.
You should look at the settings available in the Redelivery Policy. Also you can configure redelivery to not block new message dispatch using the setNonBlockingRedelivery method.

Related

Message order issue in single consumer connected to ActiveMQ Artemis queue

Any possibility of message order issue while receive single queue consumer and multiple producer?
producer1 publish message m1 at 2021-06-27 02:57:44.513 and producer2 publish message m2 at 2021-06-27 02:57:44.514 on same queue worker_consumer_queue. Client code connected to the queue configured as single consumer should receive message in order m1 first and then m2 correct? Sometimes message receive in wrong order. version is ActiveMQ Artemis 2.17.0.
Even though I mentioned that multiple producer, message publish one after another from same thread using property blockOnDurableSend=false.
I create and close producer on each message publish. On same JVM, my assumption is order of published messages in queue, from same thread or from different threads even with async. timestamp is getJMSTimestamp(). async publish also maintain any internal queue has order?
If you use blockOnDurableSend=false you're basically saying you don't strictly care about the order or even if the message makes it to the broker at all. Using blockOnDurableSend=false basically means "fire and forget."
Furthermore, the JMSTimetamp is not when the message is actually sent as noted in the javax.jms.Message JavaDoc:
The JMSTimestamp header field contains the time a message was handed off to a provider to be sent. It is not the time the message was actually transmitted, because the actual send may occur later due to transactions or other client-side queueing of messages.
With more than one producer there is no guarantee that the messages will be processed in order.
More producers, ActiveMQ Artemis and one consumer are a distributed system and the lack of a global clock is a significant characteristic of distributed systems.
Even if producers and ActiveMQ Artemis were on the same machine and used the same clock, ActiveMQ Artemis could not receive the messages in the same order producers would create and send their messages. Because the time to create a message and the time to send a message include variable time latencies.
The easiest solution is to trust the order of the messages received by ActiveMQ Artemis, adding a timestamp with an interceptor or enabling the ingress timestamp, see ARTEMIS-2919 for further details.
If the easiest solution doesn't work, the distributed solution is to implement a distributed system total ordering algorithm as lamport timestamps.
Well, as it seams it is not a bug within Artemis, when it comes to a millisecond difference it is more like a network lag or something like this.
So to workaround I got to the idea, you could create a algorythm in which a recieved message will wait for ~100ms before it is really worked through (whatever you want to be doing with this message) and check if there is another message which your application recieved afterwards but is send before. So basicly have your own receiver queue with a delay.
IF there is message that was before, you could simply move that up in your personal algorythm. You could also think about to reject the first message back to your bus, depending on your settings on queues and topics it would be able to recieve it afterwards again.

max-delivery-attempts does not work for un-acknowledged messages

I noted strange behavior in Artemins. I'm not sure if this is a bug or if I don't understand something.
I use Artemis Core API. I set autoCommitAcks to false. I noted that If message is received in MessageHandler but message is not acknowledged and session is rollbacked then Artemis does not consider this message as undelivered, Artemis consider this message as not sent to consumer at all. Parameter max-delivery-attempts does not work in this case. Message is redelivered an infinite number of times. Method org.apache.activemq.artemis.api.core.client.ClientMessage#getDeliveryCount returns 1 each time. Message has false value in Redelivered column in web console. If message is acknowledged before session rollback then max-delivery-attempts works properly.
What exactly is the purpose of message acknowledge? Acknowledge means only that message was received or acknowledge means that message was received and processed successfully? Maybe I can use acknowledge in both ways and it only depends on my requirements?
By message acknowledge I mean calling org.apache.activemq.artemis.api.core.client.ClientMessage#acknowledge method.
The behavior you're seeing is expected.
Core clients actually consume messages from a local buffer which is filled with messages from the broker asynchronously. The amount of message data in this local buffer is controlled by the consumerWindowSize set on the client's URL. The broker may dispatch many thousands of messages to various clients that sit in these local buffers and are never actually seen in any capacity by the consumers. These messages are considered to be in delivery and are not available to other clients, but they are not considered to be delivered. Only when a message is acknowledged is it considered to be delivered to a client.
If the client is auto-committing acknowledgements then acknowledging a message will quickly remove it from its respective queue. Once the message is removed from the queue it can no longer be redelivered because it doesn't exist anymore on the broker. In short, you can't get configurable redelivery semantics if you auto-commit acknowledgements.
However, if the client is not auto-committing acknowledgements and the consumer closes (for any reason) without committing the acknowledgements or calls rollback() on its ClientSession then the acknowledged messages will be redelivered according to the configured redelivery semantics (including max-delivery-attempts).

When message process fails, can consumer put back message to same topic?

Suppose one of my program consuming message from kafka topic. During processing of message, consumer access some db. Its db acccess fails due to xyz reason. But we dont have to abandon the message. We need to park the message for later processing. In JMS when message processing fails application container put back the message to the queue. It does not lost. In Kafka once it received its offset increases and next message comes. How to handle this ?
There are two approaches to achieve this.
Set the Kafka Acknowledge mode to manual and in case of error terminate the consumer thread without submitting the offset (If group management is enabled new consumer will be added after triggering re balancing and poll the same batch)
Second approach is simple, just have one error topic and publish messages to error topic in case of any error, so later you can consumer them or keep track of them.

Kafka message loss because of later message

So I got some annoying offset commiting case with my kafka consumers.
I use 'kafka-node' for my project.
I created a topic.
Created 2 consumers within a consumer-group over 2 servers.
Auto-commit set to false.
For every mesaage my consumers get, they start an async process which can take between 1~20sec, when the process done the consumer commits the offset..
My problem is:
There is a senarios in which,
Consumer 1 gets a message and takes 20sec to process.
In the middle of the process he gets another message which takes 1s to process.
He finish the second message processing, commit the offset, then crashes right away.
Causing the previous message processing to fail.
If I re run the consumer, hes not reading the first message again, because the second message already commited the offsst which is greater than the first.
How can i avoid this?
Kafkaconsumer.on('message', async(message)=>{
await SOMETHING_ASYNC_1~20SEC;
Kafkaconsumer.commit(()=>{});
});
You essentially want to throttle messages and handle concurrency by utilizing async.queue.
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate so it will not freeze up the event loop)
Set the queue.drain to resume the consumer
The handler for consumer's message event pauses the consumer and pushes the message to the queue.
The kafka-node README details this here.
An example implementation, similar to your problem, can be found here.

Kafka reset partition re-consume or not

If I consume from my topic and manage the offset myself, some records I process are successful then I move the offset on-wards, but occasionally I process records that will throw an exception. I still need to move the offset onwards. But at a later point I will need to reset the offset and re-process the failed records. Is it possible when advancing the offset to set a flag to say that if I consumer over that event again ignore or consume?
The best way to handle these records is not by resetting the offsets, but by using a dead-letter queue, essentially, by posting them to another kafka topic for reprocessing later. That way, your main consumer can focus on processing the records that don't throw exceptions, and some other consumer can constantly be listening and trying to handle the records that are throwing errors.
If that second consumer is still throwing exceptions when trying to reprocess the messages, you can either opt to repost them to the same queue, if the exception is caused by a transient issue (system temporarily unavailable, database issue, network blip, etc), or simply opt to log the message ID and content, as well as the best guess as to what the problem is, for someone to manually look at later.
Actually - no, this is not possible. Kafka records are read only. I've seen this use case in practice and I will try to give you some suggestions:
if you experience an error, just copy the message in a separate error topic and move on. This will allow you to replay all error messages at any time from the error topic. That would definitely be my preferred solution - flexible and performant.
when there is an error - just hang your consumer - preferably enter an infinite loop with an exponential backoff rereading the same message over and over again. We used this strategy together with good monitoring/alerting and log compaction. When something goes wrong we either fix the broken consumer and redeploy our service or if the message itself was broken the producer will fix its bug, republish the message with the same key and log compaction will kick in. The faulty message will be deleted (log compaction). We will be able to move our consumers forward at this point. This requires manual interaction in most cases. If the reason for the fault is a networking issue (e.g. database down) the consumer may recover by itself.
use local storage (e.g. a database) to store which offsets failed. Then reset the offset and ignore the successfully processed records. This is my least preferred solution.