Kafka producer retries docs make sense? - apache-kafka

the current (3.2) producer retry documentantion in Kafka is:
Allowing retries while setting enable.idempotence to false and max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.
Previously, the documentation for 2.8 was:
Allowing retries without setting max.in.flight.requests.per.connection to 1 will potentially change the ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second succeeds, then the records in the second batch may appear first.
Do the two docs contradict themselves?
From previous experience, setting max.in.flight.requests.per.connection=1 will ensure ordering even for enable.idempotence=false , which is not what the current documentation states.
UPDATE:
I've found that the default acks configurations changed and it might be a clue:
Notable changes in 3.0.0:
The producer has stronger delivery guarantees by default: idempotence is enabled and acks is set to all instead of 1. See KIP-679 for details.
However, it is more related to data loss than to ordering.

Related

Handling the Duplicate and Order of the message in Kafka

Trying to understand the difference between following configuration to handle Order of the message and Duplicate message in Kafka .  I could not find any detailed explanation in anywhere .
Could you please help me understand with some use case .
enable.idempotence=true
Idempotent producers can handle duplicate messages and preserve message order even with request pipelining—there is no message duplication because the broker ignores duplicate sequence numbers, and message ordering is preserved because when there are failures, the producer temporarily constrains to a single message in flight until sequencing is restored.
max.in.flight.requests.per.connection=1 to ensure that only one request can be sent to the broker at a time. To preserve message order while allowing request pipelining, set the configuration
parameter retries=0 if the application is able to tolerate some message loss
When you set enable.idempotence=true these configurations are automatically set if you don't set them manually.
retries=Integer.MAX_VALUE
max.in.flight.requests=5 (if Kafka version >= 1.0 You can check this for more information.)
acks=all
And this is the ideal configuration for idempotent producer. If you set retries=0, in case of network failure you couldn't even send the message to broker.
enable.idempotence: When set to 'true', the producer will ensure that exactly one copy of each message is written in the stream. If
'false', producer retries due to broker failures, etc., may write
duplicates of the retried message in the stream. Note that enabling
idempotence requires max.in.flight.requests.per.connection to be less
than or equal to 5, retries to be greater than 0 and acks must be
'all'. If these values are not explicitly set by the user, suitable
values will be chosen. If incompatible values are set, a
ConfigException will be thrown.

Apache Kafka the order of messages in partition guarantee

Read this article about message ordering in topic partition: https://blog.softwaremill.com/does-kafka-really-guarantee-the-order-of-messages-3ca849fd19d2
Allowing retries without setting max.in.flight.requests.per.connection
to 1 will potentially change the ordering of records because if two
batches are sent to a single partition, and the first fails and is
retried but the second succeeds, then the records in the second batch
may appear first.
According it there are two types of producer configs possible to achieve ordering guarantee:
max.in.flight.requests.per.connection=1 // can impact producer throughput
or alternative
enable.idempotence=true
max.in.flight.requests.per.connection //to be less than or equal to 5
max.retries // to be greater than 0
acks=all
Can anybody explain how second configuration achieves order guarantee? Also in the second config exactly-once semantics enabled.
idempotence:(Exactly-once in order semantics per partition)
Idempotent delivery enables the producer to write a message to Kafka exactly
once to a particular partition of a topic during the lifetime of a
single producer without data loss and order per partition.
Idempotent is one of the key features to achieve Exactly-once Semantics in Kafka. To set “enable.idempotence=true” eventually get exactly-once semantics per partition, meaning no duplicates, no data loss for a particular partition. If an error occurred even producer send messages multiple times will get written to Kafka once.
Kafka producer concept of PID and Sequence Number to achieve idempotent as explained below:
PID and Sequence Number
Idempotent producers use product id(PID) and sequence number while producing messages. The producer keeps incrementing the sequence number on each message published which map with unique PID. The broker always compares the current sequence number with the previous one and it rejects if the new one is not +1 greater than the previous one which avoids duplication and the same time if more than greater show lost in messages.
In a failure scenario it will still maintain sequence number and avoid duplication as shown below:
Note: When the producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session
If you are using enable.idempotence=true you can keep max.in.flight.requests.per.connection up to 5 and you can achieve order guarantee which brings better parallelism and improve performance.
Idempotence feature introduced in Kafka 0.11+ before we can achieve some level level of guaranteed using max.in.flight.requests.per.connection with retries and Acks setting:
max.in.flight.requests.per.connection to 1
max.retries bigger number
acks=all
max.in.flight.requests.per.connection=1: to make sure that while messages are retrying, additional messages will not be sent.
This gives guarantee at-least-once and comes with cost on performance and throughput and that's encourage introduced enable.idempotence feature to improve the performance and at the same time guarantee ordering.
exactly_once: To achieve exactly_once along with idempotence we need to set transaction as read_committed and will not allow to overwrite following parameters:
isolation.level:read_committed( Consumers will always read committed
data only)
enable.idempotence=true (Producer will always haveidempotency enabled)
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=5 (Producer will
always have one in-flight request per connection)
enable.idempotence is a newer setting that was introduced as part of kip-98 (implemented in kafka 0.11+). before it users would have to set max.inflight to 1.
the way it works (abbreviated) is that producers now put sequence numbers on ourgoing produce batches, and brokers keep track of these sequence numbers per producer connected to them. if a broker receives a batch out of order (say batch 3 after 1) it rejects it and expects to see batch 2 (which the producer will retransmit). for complete details you should read kip-98

Kafka - timestamp order

Assume I'm using log.message.timestamp.type=LogAppendTime.
Also assume number of messages per topic/partition during first read:
topic0:partition0: 5
topic0:partition1: 0
topic0:partition2: 3
topic1:partition0: 2
topic1:partition1: 0
topic1:partition2: 4
and during second read:
topic0:partition0: 5
topic0:partition1: 2
topic0:partition2: 3
topic1:partition0: 2
topic1:partition1: 4
topic1:partition2: 4
If I read first message from each partition, does Kafka guarantee that reading again from each partition won't return a message that's older than those I read during first read?
Focus on topic0:partition1 and topic1:partition1 which didn't have any messages during first read, but have during second read.
Kafka guarantees message ordering at partition level, so your use case perfectly fits kafka's architecture.
There are some concepts to explain in here. First of all, you have the starting consumer position (when you first launch a new consumer group), defined by the auto.offset.reset parameter.
This will kick in only if there's no saved offset for that group, or if a saved offset is not valid anymore (f.e, if it was already deleted by retention policies). You should normally only worry for this if you launch a new consumer group (and you want to decide wether it starts from the oldest messages, or from the present - newest one).
Regarding your example, in normal conditions (there are no consumer shutdowns, etc), you have nothing to worry about. Consumers within a same consumer group will only read their messages once, no matter the number of partitions nor the number of consumers. These consumers remember their last read offset, and periodically save it in the _consumer_offsets topic.
There are 2 properties that define this periodical recording:
enable.auto.commit
Setting it to true (which is the default value) will allow the automatic commit to the _consumer_offsets topic.
auto.commit.interval.ms
Defines when the offsets are commited. For example, with a value of 10000, your consumer offsets will be stored every 10 seconds.
You can also set enable.auto.commit to false and store your offsets in your own way (f.e to a database, etc), but this is a more special use case.
The auto offset committing will allow you to stop your consumers, and start them again later without losing any message nor reprocessing already processed ones (it's like a mark in a book's page). If you don't stop your consumers (and without any errors from broker/zookeeper/consumers), even less worries for you.
For more info, you can take a look here: https://docs.confluent.io/current/clients/consumer.html#concepts
Hope it helps!

Kafka: isolation level implications

I have a use case where I need 100% reliability, idempotency (no duplicate messages) as well as order-preservation in my Kafka partitions. I'm trying to set up a proof of concept using the transactional API to achieve this. There is a setting called 'isolation.level' that I'm struggling to understand.
In this article, they talk about the difference between the two options
There are now two new isolation levels in Kafka consumer:
read_committed: Read both kind of messages that are not part of a
transaction and that are, after the transaction is committed.
Read_committed consumer uses end offset of a partition, instead of
client-side buffering. This offset is the first message in the
partition belonging to an open transaction. It is also known as “Last
Stable Offset” (LSO). A read_committed consumer will only read up till
the LSO and filter out any transactional messages which have been
aborted.
read_uncommitted: Read all messages in offset order without
waiting for transactions to be committed. This option is similar to
the current semantics of a Kafka consumer.
The performance implication here is obvious but I'm honestly struggling to read between the lines and understand the functional implications/risk of each choice. It seems like read_committed is 'safer' but I want to understand why.
First, the isolation.level setting only has an impact on the consumer if the topics it's consuming from contains records written using a transactional producer.
If so, if it's set to read_uncommitted, the consumer will simply read everything including aborted transactions. That is the default.
When set to read_committed, the consumer will only be able to read records from committed transactions (in addition to records not part of transactions). It also means that in order to keep ordering, if a transaction is in-flight the consumer will not be able to consume records that are part of that transation. Basically the broker will only allow the consumer to read up to the Last Stable Offset (LSO). When the transation is committed (or aborted), the broker will update the LSO and the consumer will receive the new records.
If you don't tolerate duplicates or records from aborted transactions, then you should use read_committed. As you hinted this creates a small delay in consuming as records are only visible once transactions are committed. The impact mostly depends on the sizes of your transactions, ie how often you commit.
If you are not using transactions in your producer, the isolation level does not matter. If you are, then you must use read_committed if you want the consumers to honor the transactional nature. Here are some additional references:
https://www.confluent.io/blog/transactions-apache-kafka/
https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit
if so, if it's set to read_uncommitted, the consumer will simply read everything including aborted transactions. That is the default.
To clarify things a bit for readers: this is the default only in the Java Kafka client. It was done to not change the semantics when transactions were introduced back in the day.
It's the opposite in librdkafka which sets the isolation.level configuration to read_committed by default. As a result, all libraries built on top of librdkafka will consume only committed messages by default: confluent-kafka-python, confluent-kafka-dotnet, rdkafka-ruby.
KafkaJS consumers also uses read committed by default (readUncommitted set to false).

How is ordering guaranteed during failures in Kafka Async Producer?

If I am using Kafka Async producer, assume there are X number of messages in buffer.
When they are actually processed on the client, and if broker or a specific partition is down for sometime, kafka client would retry and if a message is failed, would it mark the specific message as failed and move on to the next message (this could lead to out of order messages) ? Or, would it fail the remaining messages in the batch in order to preserve order?
I next to maintain the ordering, so would ideally want to kafka to fail the batch from the place where it failed, so I can retry from the failure point, how would I achieve that?
Like it says in the kafka documentation about retries
Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error. Note that
this retry is no different than if the client resent the record upon
receiving the error. Allowing retries will potentially change the
ordering of records because if two records are sent to a single
partition, and the first fails and is retried but the second succeeds,
then the second record may appear first.
So, answering to your title question, no kafka doesn't have order guarantees under async sends.
I am updating the answers base on Peter Davis question.
I think that if you want to send in batch mode, the only way to secure it I would be to set the max.in.flight.requests.per.connection=1 but as the documentation says:
Note that if this setting is set to be greater than 1 and there are
failed sends, there is a risk of message re-ordering due to retries
(i.e., if retries are enabled).
Starting with Kafka 0.11.0, there is the enable.idempotence setting, as documented.
enable.idempotence: When set to true, the producer will ensure that
exactly one copy of each message is written in the stream. If false,
producer retries due to broker failures, etc., may write duplicates of
the retried message in the stream. Note that enabling idempotence
requires max.in.flight.requests.per.connection to be less than or
equal to 5, retries to be greater than 0 and acks must be all. If
these values are not explicitly set by the user, suitable values will
be chosen. If incompatible values are set, a ConfigException will be
thrown.
Type: boolean Default: false
This will guarantee that messages are ordered and that no loss occurs for the duration of the producer session. Unfortunately, the producer cannot set the sequence id, so Kafka can make these guarantees only per producer session.
Have a look at Apache Pulsar if you need to set the sequence id, which would allow you to use an external sequence id, which would guarantee ordered and exactly-once messaging across both broker and producer failovers.