How does kafka ack batch AsyncProducer - apache-kafka

How does kafka sends ack when using batch async Producer? Is the ack is per message/per batch/per sub-batch(i.e. batch per partition)?
Is it recommended to use the ack in async batch prdocuer? or better just to use the callback mechanism?

How does kafka sends ack when using batch async Producer
two choice by updating se Future, or by doing the callback code.
java.util.concurrent.Future send​(ProducerRecord<K,V> record)
java.util.concurrent.Future send​(ProducerRecord<K,V> record, Callback callback)
Is the ack is per message/per batch/per sub-batch(i.e. batch per partition)
Good question. I think the ack is for the entire request so all the messages of the request (but not sure, and I didn't find the info)
Is it recommended to use the ack in async batch prdocuer? or better
just to use the callback mechanism?
It is two different notion, send are alway async with kafka if you do not do a future.get()
If you want to send many record by batch you will have to send without blocking. (without doing future.get() at each time)

"How does kafka sends ack when using batch async Producer? Is the ack is per message/per batch/per sub-batch(i.e. batch per partition)?"
The ack is sent per batch (which also means: per partition). If one or more individual messages within a batch fails, the entire batch is considered a failure. Depending on your retries configuration this batch will then be re-sent.
"Is it recommended to use the ack in async batch prdocuer? or better just to use the callback mechanism?"
I do not think this is an either-or question. You can have acks>0 and at the same time use a callback. The acks setting in general allows you to improve upon the stability of your Producer, independently of any callbacks.
You should use callback if you require to understand the full details why the sent was a failure. I have provided some more details on Producer Callback Exceptions on another post of mine.
In addition, you could use the callback of your asynchronous producer to handle manual commits while avoid committing lower offsets due to retires as described in Kafka Consumer offset commit check to avoid committing smaller offsets

Related

kafka - Asynchronous and acks

In Kafka context, Asynchronous and asks concept are confusing for me, but I would like understand these concepts clearly.
If asks = 1 or all, does a Kafka producer need to wait for the ack response from broker and can’t do nothing?
Without the ack response, Kafka producer can’t send next message to broker? If so, it looks like synchronous because the producer is waiting for the ack result from broker.
Could you help me to understand well these concepts?
Thanks.
The answer lies in the send() method.
As per the official docs:
public Future send(ProducerRecord record, Callback callback)
Asynchronously send a record to a topic and invoke the provided
callback when the send has been acknowledged. The send is asynchronous
and this method will return immediately once the record has been
stored in the buffer of records waiting to be sent. This allows
sending many records in parallel without blocking to wait for the
response after each one.
As you can see the signature - it accepts a callback method as well as returns a Future object.
Upon calling this method, the method itself doesn't really care about the acks config. It won't block the call but will leave the decision to the calling method by returning a Future object and accepting a callback as well. It'll push messages to buffer as it receives, and leaves the rest to these two i.e. Future and callback.
At the buffer level, the acks starts getting honored, but this is done parallel and doesn't block the send caller.
With acks=0, the producer will assume the message is written as it gets sent. (aka Fire and forget)
With acks=1, the producer will consider the write successful only when the leader receives the record, in case of no acknowledgment - it'll retry based on your configuration and use the callback accordingly.
With acks=all -> This only changes the acknowledgment part - i.e. the producer will consider the write successful successfully written on all replicas, as it'll receive acknowledgment based on min.insync.replicas. Rest is as acks=1.
With the future you received, you can either check it later and continue sending messages or call the get() method - which would cause a block.
Or you can use a callback to perform the action as and when acknowledgement is received.
So TLDR;
If asks = 1 or all, does a Kafka producer need to wait for the ack response from broker and can’t do nothing?
Depends on whether you instantly use the Future.get() method - which would block.
producer.send(msg).get() //Waiting...
Or just ignore the return and delegate actions to a callback
producer.send(record,
new Callback() {
// Do the same thing
}
});
acks = 1 or all does not mean that the producer will not send the next batch until it gets the acknowledgement of the previous batch; it means that if it does not get an acknowledgement from the leader (acks = 1) or leader and all the ISRs (acks = all) - it will attempt a retry.
Even without acknowledgement of the previous batch - producers will send the next batch given you have set the configurations for it (like max.in.flight.requests.per.connection is greater than one and you implement callback on the producer side for processing the response in the asynchronous way.)

Can sending message to Kafka asynchronously effect message ordering?

This is my understanding of sending messages to Kafka asynchronously and implementing a callback function is:
Scenario: Producer is going to receive 4 batches of messages to send (to same partition, for the sake of simplicity).
Producer sends batch A.
Producer sends batch B.
Producers receives reply from broker and implements call back - batch A was unsuccessful and retriable, batch B was successful, so producer sends batch A again.
Won't this disturb the message ordering as now A is received by Kafka after B?
If you need message ordering within partition you can use idempotent producer:
enable.idempotence=true
akcs=all
max.in.flight.requests.per.connection<=5
retries>0
This will resolve potential duplicates from producer and maintain the ordering.
If you don't want to use idempotent producer then it is enough to set max.in.flight.requests.per.connection=1. This is the number of unacknowledged batches on the producer side. It means that batch B will not be sent before acknowledge for A is received.

Kafka behavior with acks=0 and synchronous producer

What is the behavior of a synchronous producer when we set acks=0
producer.send(record).get();
The above statement will return immediately or we will wait for the response (RecordMetadata) ?
In other words, is it correct to say that with acks=0 and send().get() we will still wait for acknowledgment/response from the server ?
Is there some explanations in the apache Kafka documentation ?
The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
Send() returns java.util.concurrent.Future
get()
Waits if necessary for the computation to complete, and then retrieves its result.
So for your question, get() won't be ignored , but it just won't need to wait for anything to complete because the send() will return prompt immediately without waiting for acknowledgment from the brokers,
So your statement is false
As said send() is async function that returns future, future have function called get() so it will wait for answer , you can read the code: Sender and KafkaProducer in github project, if ack=0 it just return an answer back without waiting for acknowledgment from the brokers
https://github.com/apache/kafka/blob/98bd590718658f3af2d7f1ff307d1d170624f199/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L579

Reactive kafka producer

Am exploring reactive kafka and just wanted to confirm if reactive kafka is equivalent to sync producer. With sync producer, We get message delivery guarantee with ACK all and producer sequence is maintained. But, delivery and sequencing is not guaranteed with ASYNC. Is reactive producer equivalent to SYNC or ASYNC?
Reactive means async. In plain Kafka clients API also, the KafkaProducer is asynchronous. It becomes synchronous when you explicitly call the kafkaProducer.send().get() which blocks the execution of the program.
Even with async producer, message delivery is guaranteed. It depends on the no. of retries and the delivery.timeout.ms.
With ack=all it is ensured that data is replicated among the ISR and you get fault-tolerance with consistency guarantees, in case if one of the broker dies, the consumers see what is intended to be seen.
As for the sequence, messages are batched before they are sent. If multiple batches are sent asynnchronously like batch-1, batch-2, batch-3 and if for some reason batch-1 fails after batch-2 and batch-3 have been sent, then retying batch-1 will produce batch-1 after batch-2, batch-3, thereby making it out of sequence.
If you want sequence, you need to ensure that max.in.flight.requests.per.connection set to 1 so that only one request can be in-flight at any given instant of time per producer. However, this can have performance impact. You may want to tune other settings like batch.size like increasing it for example, to get increased throughput with the inflight requests set to 1.
So your assumption that delivery and sequencing is not guaranteed with ASYNC is false.

What is the impact of acknowledgement modes in asynchronous send in kafka?

Does acknowledgement modes ('0','1','all') have any impact, when we send messages using asynchronous send() call?
I have tried measuring the send call latency (that is, by recording time before and after call to send() method) and observed that both (asynchronous send, acks=1) and (asynchronous send, acks=all) took same time.
However, there is a clear difference in throughput numbers.
producer.send(record, new ProducerCallback(record));
I thought send call will be blocked when we use acks=all even in asynchronous mode. Can someone explain how acknowledgement modes ('0','1','all') work in asynchronous mode?
According to the docs:
public Future send(ProducerRecord record,
Callback callback)
Asynchronously send a record to a topic and invoke the provided
callback when the send has been acknowledged. The send is asynchronous
and this method will return immediately once the record has been
stored in the buffer of records waiting to be sent. This allows
sending many records in parallel without blocking to wait for the
response after each one.
So, one thing is certain that the asynchronous "send" does not really care about what the "acks" config is. All it does is push the message into a buffer. Once this buffer starts getting processed (controlled by linger.ms and batch.size properties), then "acks" is checked.
If acks=0 -> Just fire and forget. Producer won't wait for an acknowledgement.
If acks=1-> Acknowledgement is sent by the broker when message is successfully written on the leader.
If acks=all -> Acknowledgement is sent by the broker when message is successfully written on all replicas.
In case of 1 and all, this becomes a blocking call as the producer will wait for an acknowledgement but, you might not notice this as it happens on a parallel thread. In case of acks=all, it is expected that it will take a little longer for the ack to arrive than acks=1 (network latency and number of replicas being the obvious reasons).
Further, you should configure "retries" property in your async-producer so that, if an acknowledgement fails (due to any reason e.g. packet corrupted/lost), the producer knows how many times it should try again to send the message (increase the guarantee of delivery).
Lastly: "However, there is a clear difference in throughput numbers." -- That's true because of the acknowledgement latency from the broker to the producer thread.
Hope that helps! :)