I have a DLQ with a large amount of "Messages killed". The broker is a single node. There is no cluster.
According to this documentation "Messages killed" is :
Amount of messages that have been killed on the broker to exceeding the max delivery attempts and is collected form the sum of subcomponent=queues#MessagesKilled for all queues.
What use case is ActiveMQ Artemis trying to solve? Why on earth does DLQ have any amount of Messages killed? I expect this value to be 0 initially anyway.
Attributes:
Acknowledge attempts 68890
Address DLA
Configuration managed false
Consumer count 0
Consumers before dispatch 0
Dead letter address DLA
Delay before dispatch -1
Delivering count 0
Delivering size 0
Durable true
Durable delivering count 0
Durable delivering size 0
Durable message count 1539
Durable persistent size 2183288
Durable scheduled count 0
Durable scheduled size 0
Enabled true
Exclusive false
Expiry address ExpiryQueue
...
Group buckets -1
Group count 0
Group first key
Group rebalance false
Group rebalance pause dispatch false
Id 1006063
Last value false
Last value key
Max consumers -1
Message count 1539
Messages acknowledged 0
Messages added 70429
Messages expired 0
Messages killed 68890
Name jms.queue.satellitanalys_request.DLQ
Object Name org.apache.activemq.artemis:broker="0.0.0.0",component=addresses,address="DLA",subcomponent=queues,routing-type="multicast",queue="jms.queue.satellitanalys_request.DLQ"
Paused false
Persistent size 2183288
Prepared transaction message count 0
Purge on no consumers false
Retroactive resource false
Ring size -1
Routing type MULTICAST
Scheduled count 0
Scheduled size 0
Temporary false
User
The actual description of "Messages killed" can be acquired from the MBean itself which states:
number of messages removed from this queue since it was created due to exceeding the max delivery attempts
Generally speaking, the use-case being solved here is a message which cannot be consumed (for whatever reason) is being removed from the queue (i.e. killed) so that the consumer can receive, and hopefully successfully process, a different message. The behavior is 100% configurable in broker.xml.
There are a handful of important metrics here:
Acknowledge attempts 68890
Message count 1539
Messages acknowledged 0
Messages added 70429
Messages killed 68890
Acknowledge "attempts" and actual acknowledgements are tracked independently because, for example, a message may be acknowledged in a transaction and then that transaction can be rolled back in which case the message won't actually be acknowledged. That appears to be the case here since there have been 68,890 attempts to acknowledge but 0 actual acknowledgements. I can only assume that the max-delivery-attempts for this queue is 1 since there are also 68,890 killed messages. Notice too that the number of messages added is 70,429 which corresponds to the message count of 1,539 (i.e. 70,429 - 68,890 = 1,539). Everything seems to be accounted for.
My conclusion is that you have (or had) a consumer that is (or was) attempting to consume messages from this queue via a transaction, and that transaction was rolled back in every instance.
Keep in mind that a "dead-letter queue" is just a normal queue like any other. All the same configuration and semantics apply to it as they would apply to any other queue.
Related
If ActiveMQ Artemis is configured with a redelivery-delay > 0 and a JMS listener uses ctx.rollback() or ctx.recover() then the broker will redeliver the message as expected. But if a producer pushes a message to the queue during a redelivery then the receiver gets unordered messages.
For example:
Queue: 1 -> message 1 is redelivered as expected
Push during the redelivery phase
Queue: 2,3 -> the receiver gets 2,3,1
With a redelivery-delay of 0 everything is ok, but the frequency of redeliveries on consumer side is too high. My expectation is that every delivery to the consumer should be stopped until the unacknowledged message is purged from the queue or acknowledged. We are using a queue for connection with single devices. Every device has it's own I/O queue with a single consumer. The word queue suggest strict ordering to me. It could be nice to make this behavior configurable like "strict_redelivery_order".
What you're seeing is the expected behavior. If you use a redelivery-delay > 0 then delivery order will be broken. If you use a redelivery-delay of 0 then delivery order will not be broken. Therefore, if you want to maintain strict order then use a redelivery-delay of 0.
If the broker blocked delivery of all other messages on the queue during a redelivery delay that would completely destroy message throughput performance. What if the redelivery delay were 60 seconds or 10 minutes? The queue would be blocked that entire time. This would not be tenable for an enterprise message broker serving hundreds or perhaps thousands of clients each of whom may regularly be triggering redeliveries on shared queues. This behavior is not configurable.
If you absolutely must maintain message order even for messages that cannot be immediately consumed and a redelivery-delay of 0 causes redeliveries that are too fast then I see a few potential options (in no particular order):
Configure a dead-letter address and set a max-delivery-attempts to a suitable value so after a few redeliveries the problematic message can be cleared from the queue.
Implement a delay of your own in your client. This could be as simple as catching any exception and using a Thread.sleep() before calling ctx.rollback().
There is a Kafka (version 2.2.0) cluster of 3 nodes. One node becomes artificially unavailable (network disconnection). Then we have the following behaviour:
We send a record to a producer with the given topic-partition (to the specific partition, let's say #0).
We receive a record metadata from the producer what confirms that it has been acknowledged.
Immediately after that we poll a consumer assigned to the same topic-partition and an offset taken from the record's metadata. The poll timeout was set to 30 seconds. No data is returned (an empty set is returned).
This happens inconsistently from time to time (under described circumstances with one Kafka node failure).
Essentially my question is: should data be immediately available for consumers ones it is acknowledged? What the reasonable timeout for that if not?
UPD: some configuration details:
number of partitions for the topic: 1
default replication factor: 3
sync replication factor: 2
acks for producer: all
The default setting of acks on the producer is 1. This means the producer waits for the acknowledgement from the leader replica only. If the leader dies right after acknowledging, the message won't be delivered.
Should data be immediately available for consumers? Yes, in general there should be very little lag per default, should be effectively on the milliseconds range per default and without load.
If you want to make sure that a message can't be lost, you have to configure the producer to "acks=all" in addition to min.insync.replicas=2. This will make sure all in sync replicas acknowledge the message, and that minimum 2 nodes do. So you are still allowed to lose one node and be fine. Lose 2 nodes and you won't be able to send, but even then messages won't be lost.
I am trying to use a queue with multiple subscribers (each with a unique selector) along with setting the destination.consumer.exclusive flag to true. But when I post a message to this Queue, I see that the message is available with in the queue but none of the subscribers have picked it up in spite of it meeting one of the consumer's selector criteria.
I see the following details on the AMQ UI console:
Number of pending messages - 1
Number of consumers - 6
Messages enqueued - 1
Messages dequeued - 0
Although the number of messages pending on the queue is 1, none of the consumers have any "enqueues" on them in spite of the pnding mesage meeting the selection criteria.
Exclusive consumers would override any selector in terms of Queue load balancing so use either one or the other. Exclusive consumer is named that way for a reason, namely the consumer is the only one that can consume from the Queue until it goes offline. It really doesn't make any sense to mix selectors and exclusive options in the first place.
If I set Kafka config param at Producer as:
1. retries = 3
2. max.in.flight.requests.per.connection = 5
then its likely that Messages within one partition may not be in send_order.
Does Kafka takes any extra step to make sure that messages within a partition remains in sent order only
OR
With above configuration, its possible to have out of order messages within a partition ?
Unfortunately, no.
With your current configuration, there is a chance message will arrive unordered because of your retries and max.in.flight.requests.per.connection settings..
With retries config set to greater than 0 you will lose ordering in the following case (just an example with random numbers):
You send a message/batch to partition 0 which is located on broker 0, and brokers 1 and 2 are ISR.
Broker 0 fails, broker 1 becomes leader.
Your message/batch returns a failure and needs to be retried.
Meanwhile, you send next message/batch to partition 0 which is now known to be on broker 1, and this happens before your previous batch actually gets retried.
Message/batch 2 gets acknowledged (succeeds).
Message/batch 1 is re sent and now gets acknowledged too.
Order lost.
I might be wrong, but in this case, reordering can probably happen even with max.in.flight.requests.per.connection set to 1 you can lose message order in case of broker failover, e.g. batch can be sent to the broker earlier than the previous failed batch figures out it should go to that broker too.
Regarding max.in.flight.requests.per.connection and retries being set together it's even simpler - if you have multiple unacknowledged requests to a broker, the first one to fail will arrive unordered.
However, please take into account this is only relevant to situations where a message/batch fails to acknowledge for some reason (sent to wrong broker, broker died etc.)
Hope this helps
If I am using Kafka Async producer, assume there are X number of messages in buffer.
When they are actually processed on the client, and if broker or a specific partition is down for sometime, kafka client would retry and if a message is failed, would it mark the specific message as failed and move on to the next message (this could lead to out of order messages) ? Or, would it fail the remaining messages in the batch in order to preserve order?
I next to maintain the ordering, so would ideally want to kafka to fail the batch from the place where it failed, so I can retry from the failure point, how would I achieve that?
Like it says in the kafka documentation about retries
Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error. Note that
this retry is no different than if the client resent the record upon
receiving the error. Allowing retries will potentially change the
ordering of records because if two records are sent to a single
partition, and the first fails and is retried but the second succeeds,
then the second record may appear first.
So, answering to your title question, no kafka doesn't have order guarantees under async sends.
I am updating the answers base on Peter Davis question.
I think that if you want to send in batch mode, the only way to secure it I would be to set the max.in.flight.requests.per.connection=1 but as the documentation says:
Note that if this setting is set to be greater than 1 and there are
failed sends, there is a risk of message re-ordering due to retries
(i.e., if retries are enabled).
Starting with Kafka 0.11.0, there is the enable.idempotence setting, as documented.
enable.idempotence: When set to true, the producer will ensure that
exactly one copy of each message is written in the stream. If false,
producer retries due to broker failures, etc., may write duplicates of
the retried message in the stream. Note that enabling idempotence
requires max.in.flight.requests.per.connection to be less than or
equal to 5, retries to be greater than 0 and acks must be all. If
these values are not explicitly set by the user, suitable values will
be chosen. If incompatible values are set, a ConfigException will be
thrown.
Type: boolean Default: false
This will guarantee that messages are ordered and that no loss occurs for the duration of the producer session. Unfortunately, the producer cannot set the sequence id, so Kafka can make these guarantees only per producer session.
Have a look at Apache Pulsar if you need to set the sequence id, which would allow you to use an external sequence id, which would guarantee ordered and exactly-once messaging across both broker and producer failovers.