Message order issue in single consumer connected to ActiveMQ Artemis queue - activemq-artemis

Any possibility of message order issue while receive single queue consumer and multiple producer?
producer1 publish message m1 at 2021-06-27 02:57:44.513 and producer2 publish message m2 at 2021-06-27 02:57:44.514 on same queue worker_consumer_queue. Client code connected to the queue configured as single consumer should receive message in order m1 first and then m2 correct? Sometimes message receive in wrong order. version is ActiveMQ Artemis 2.17.0.
Even though I mentioned that multiple producer, message publish one after another from same thread using property blockOnDurableSend=false.
I create and close producer on each message publish. On same JVM, my assumption is order of published messages in queue, from same thread or from different threads even with async. timestamp is getJMSTimestamp(). async publish also maintain any internal queue has order?

If you use blockOnDurableSend=false you're basically saying you don't strictly care about the order or even if the message makes it to the broker at all. Using blockOnDurableSend=false basically means "fire and forget."
Furthermore, the JMSTimetamp is not when the message is actually sent as noted in the javax.jms.Message JavaDoc:
The JMSTimestamp header field contains the time a message was handed off to a provider to be sent. It is not the time the message was actually transmitted, because the actual send may occur later due to transactions or other client-side queueing of messages.

With more than one producer there is no guarantee that the messages will be processed in order.

More producers, ActiveMQ Artemis and one consumer are a distributed system and the lack of a global clock is a significant characteristic of distributed systems.
Even if producers and ActiveMQ Artemis were on the same machine and used the same clock, ActiveMQ Artemis could not receive the messages in the same order producers would create and send their messages. Because the time to create a message and the time to send a message include variable time latencies.
The easiest solution is to trust the order of the messages received by ActiveMQ Artemis, adding a timestamp with an interceptor or enabling the ingress timestamp, see ARTEMIS-2919 for further details.
If the easiest solution doesn't work, the distributed solution is to implement a distributed system total ordering algorithm as lamport timestamps.

Well, as it seams it is not a bug within Artemis, when it comes to a millisecond difference it is more like a network lag or something like this.
So to workaround I got to the idea, you could create a algorythm in which a recieved message will wait for ~100ms before it is really worked through (whatever you want to be doing with this message) and check if there is another message which your application recieved afterwards but is send before. So basicly have your own receiver queue with a delay.
IF there is message that was before, you could simply move that up in your personal algorythm. You could also think about to reject the first message back to your bus, depending on your settings on queues and topics it would be able to recieve it afterwards again.

Related

ActiveMQ Artemis JMS Shared Subscription

I have a single node ActiveMQ instance with two competing consumers connected to a topic. The topic subscription is shared as per JMS 2.0 specification. Shared subscription does guarantee that only either of the subscribers (using same subscription name) gets the message. But what I noticed is that it does not guarantee that the second message is delivered only if the first one is acknowledged. In case if the first consumer takes time to acknowledge the message, the second message is delivered to the free consumer even before the acknowledgement of the first one is sent by the consumer to the broker. Is this a standard behaviour? And is there a way to stop the broker from delivering the second message before the acknowledgement of the first one?
ActiveMQ Artemis allows the exclusive queues. They are special queues which route all messages to only one consumer at a time.
Obviously exclusive queues have a draw back that you cannot scale out the consumers to improve consumption as only one consumer would technically be active.
However I would suggest to take a look at the message grouping to scale out your solution. Message groups are useful when you want all messages for a certain value of the property to be processed serially by the same consumer, without stopping the delivery of messages with different value of the property to other consumers.

max-delivery-attempts does not work for un-acknowledged messages

I noted strange behavior in Artemins. I'm not sure if this is a bug or if I don't understand something.
I use Artemis Core API. I set autoCommitAcks to false. I noted that If message is received in MessageHandler but message is not acknowledged and session is rollbacked then Artemis does not consider this message as undelivered, Artemis consider this message as not sent to consumer at all. Parameter max-delivery-attempts does not work in this case. Message is redelivered an infinite number of times. Method org.apache.activemq.artemis.api.core.client.ClientMessage#getDeliveryCount returns 1 each time. Message has false value in Redelivered column in web console. If message is acknowledged before session rollback then max-delivery-attempts works properly.
What exactly is the purpose of message acknowledge? Acknowledge means only that message was received or acknowledge means that message was received and processed successfully? Maybe I can use acknowledge in both ways and it only depends on my requirements?
By message acknowledge I mean calling org.apache.activemq.artemis.api.core.client.ClientMessage#acknowledge method.
The behavior you're seeing is expected.
Core clients actually consume messages from a local buffer which is filled with messages from the broker asynchronously. The amount of message data in this local buffer is controlled by the consumerWindowSize set on the client's URL. The broker may dispatch many thousands of messages to various clients that sit in these local buffers and are never actually seen in any capacity by the consumers. These messages are considered to be in delivery and are not available to other clients, but they are not considered to be delivered. Only when a message is acknowledged is it considered to be delivered to a client.
If the client is auto-committing acknowledgements then acknowledging a message will quickly remove it from its respective queue. Once the message is removed from the queue it can no longer be redelivered because it doesn't exist anymore on the broker. In short, you can't get configurable redelivery semantics if you auto-commit acknowledgements.
However, if the client is not auto-committing acknowledgements and the consumer closes (for any reason) without committing the acknowledgements or calls rollback() on its ClientSession then the acknowledged messages will be redelivered according to the configured redelivery semantics (including max-delivery-attempts).

Can I make sure Kafka doesn't accept two copies of the same message?

I'm writing messages along with timestamps to kafka. If I retry, the timestamp might change, and the producer that's writing, but the message content and message id is the same. The message id is generated before the message gets here, and it's a uuid.
How can I make sure kafka doesn't accept the second copy, if it successfully wrote to the topic, but the ack got lost, so the service up the chain retries? The consumers must not ever see the duplicate message.
In general there are two cases when the same message can be sent to Kafka:
During normal operation your application intentionally sends messages with the same uuid to Kafka and you want Kafka to do deduplication.
While you are sending a message to Kafka your code or Kafka brokers fail and you want to make sure the message you try to send again isn't duplicated, and also isn't lost.
I assume you are interested in case 2.. The Kafka developer's call case 2. exactly-once delivery. The latest versions of Kafka support transactions in order to enable exactly-once delivery. A complete explanation of how Kafka does this along with a code snippet can be found in this article by Confluent (the Kafka company).

Kafka Consumes unprocessable messages - How to reprocess broken messages later?

We are implementing a Kafka Consumer using Spring Kafka. As I understand correctly if processing of a single message fails, there is the option to
Don't care and just ACK
Do some retry handling using a RetryTemplate
If even this doesn't work do some custom failure handling using a RecoveryCallback
I am wondering what your best practices are for that. I think of simple application exceptions, such as DeserializationException (for JSON formatted messages) or longer local storage downtime, etc. Meaning there is needed some extra work, like a hotfix deployment, to fix the broken application to be able to re-process the faulty messages.
Since losing messages (i. e. not processing them) is not an option for us, the only option left is IMO to store the faulty messages in some persistence store, e. g. another "faulty messages" Kafka topic for example, so that those events can be processed again at a later time and there is no need to stop event processing totally.
How do you handle these scenarios?
One example is Spring Cloud Stream, which can be configured to publish failed messages to another topic errors.foo; users can then copy them back to the original topic to try again later.
This logic is done in the recovery callback.
We have a use case where we can't drop any messages at all, even for faulty messages. So when we encounter a faulty message, we will send a default message in place of that faulty record and at the same time send the message to a failed-topic for retry later.

Email notification when Kafka producer and consumer goes down

I have developed a data pipeline using Kafka. Right now I have one type of producer and two types of consumers setup in the cluster.
Producer: gets the message from a windows server
Consumer: Consumer A uses Spark Streaming to transform and present a real time view. Consumer B stores the RAW data, might be useful for building the schema at a later stage.
For various reasons starting from network, the consumers do not receive any data and also it is possible that the consumer process might die in case there is a system failure.
I would be interested in knowing if there is a way to implement something which sends you email notification when the consumer stops receiving messages or the consumer thread dies altogether. Do Kafka or Zookeeper provide a way of doing it?
Right now I am thinking of checking the target system if it is receiving messages or not. But in future if the number of targets increase it will be really complex to write email notification systems for individual targets.