What is the impact of acknowledgement modes in asynchronous send in kafka? - apache-kafka

Does acknowledgement modes ('0','1','all') have any impact, when we send messages using asynchronous send() call?
I have tried measuring the send call latency (that is, by recording time before and after call to send() method) and observed that both (asynchronous send, acks=1) and (asynchronous send, acks=all) took same time.
However, there is a clear difference in throughput numbers.
producer.send(record, new ProducerCallback(record));
I thought send call will be blocked when we use acks=all even in asynchronous mode. Can someone explain how acknowledgement modes ('0','1','all') work in asynchronous mode?

According to the docs:
public Future send(ProducerRecord record,
Callback callback)
Asynchronously send a record to a topic and invoke the provided
callback when the send has been acknowledged. The send is asynchronous
and this method will return immediately once the record has been
stored in the buffer of records waiting to be sent. This allows
sending many records in parallel without blocking to wait for the
response after each one.
So, one thing is certain that the asynchronous "send" does not really care about what the "acks" config is. All it does is push the message into a buffer. Once this buffer starts getting processed (controlled by linger.ms and batch.size properties), then "acks" is checked.
If acks=0 -> Just fire and forget. Producer won't wait for an acknowledgement.
If acks=1-> Acknowledgement is sent by the broker when message is successfully written on the leader.
If acks=all -> Acknowledgement is sent by the broker when message is successfully written on all replicas.
In case of 1 and all, this becomes a blocking call as the producer will wait for an acknowledgement but, you might not notice this as it happens on a parallel thread. In case of acks=all, it is expected that it will take a little longer for the ack to arrive than acks=1 (network latency and number of replicas being the obvious reasons).
Further, you should configure "retries" property in your async-producer so that, if an acknowledgement fails (due to any reason e.g. packet corrupted/lost), the producer knows how many times it should try again to send the message (increase the guarantee of delivery).
Lastly: "However, there is a clear difference in throughput numbers." -- That's true because of the acknowledgement latency from the broker to the producer thread.
Hope that helps! :)

Related

Should consumer or client to produce retry event?

Let's say we have a Kafka consumer poll from a normal topic that is heavy loaded and for each event, make a client call to service. The duration of client call may vary, sometimes fast sometimes slow, we have a retry topic so whenever client call has issue, we'll produce a retry event.
Here is an interesting design question, which domain should be responsible for producing the retry event?
If we let consumer to handle retry produce, this means we have to let consumer to wait for our client call gets finished, which would bring risk of consumer lag because our event processing speed would become slow
If we let service to handle retry produce, this solve the consumer lag issue as consumer would just act as send and forget. However, when service tries to produce a retry event but fails, our retry record might get lost forever in current client call
I also think of having additional DB for persisting retry events, but this would bring more concern on what if DB write operations fails and we might lose the retry similarly as kafka produce error out
The expectation would be keep it more resilient so that all failed event may get a chance for retry and at same time, should also avoid consumer lag issue
I'm not sure I completely understand the question, but I will give it a shot. To summarise, you want to ensure the producer retries if the event failed.
The producer retries default is 2147483647. If the produce request fails, it will keep retrying.
However, produce requests will fail before the number of retries are exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. The default for delivery.timeout.ms is 2 mins so you might want to increase this.
To ensure the producer always sends the record you also want to focus on the producer configurations acks.
If acks=all, all replicas in the ISR must acknowledge the record before it is considered successful. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.
The above can cause duplicate messages. If you wanted to avoid duplicates, I can also let you know how to do that.
With Spring for Apache Kafka, the DeadletterPublishingRecoverer (which can be used to publish to your "retry" topic) has a property failIfSendResultIsError.
When this is true (default), the recovery operation fails and the DefaultErrorHandler will detect the failure and re-seek the failed consumer record so that it will continue to be retried.
The non-blocking retry mechanism uses this recoverer internally so the same behavior will occur there too.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retry-topic

kafka - Asynchronous and acks

In Kafka context, Asynchronous and asks concept are confusing for me, but I would like understand these concepts clearly.
If asks = 1 or all, does a Kafka producer need to wait for the ack response from broker and can’t do nothing?
Without the ack response, Kafka producer can’t send next message to broker? If so, it looks like synchronous because the producer is waiting for the ack result from broker.
Could you help me to understand well these concepts?
Thanks.
The answer lies in the send() method.
As per the official docs:
public Future send(ProducerRecord record, Callback callback)
Asynchronously send a record to a topic and invoke the provided
callback when the send has been acknowledged. The send is asynchronous
and this method will return immediately once the record has been
stored in the buffer of records waiting to be sent. This allows
sending many records in parallel without blocking to wait for the
response after each one.
As you can see the signature - it accepts a callback method as well as returns a Future object.
Upon calling this method, the method itself doesn't really care about the acks config. It won't block the call but will leave the decision to the calling method by returning a Future object and accepting a callback as well. It'll push messages to buffer as it receives, and leaves the rest to these two i.e. Future and callback.
At the buffer level, the acks starts getting honored, but this is done parallel and doesn't block the send caller.
With acks=0, the producer will assume the message is written as it gets sent. (aka Fire and forget)
With acks=1, the producer will consider the write successful only when the leader receives the record, in case of no acknowledgment - it'll retry based on your configuration and use the callback accordingly.
With acks=all -> This only changes the acknowledgment part - i.e. the producer will consider the write successful successfully written on all replicas, as it'll receive acknowledgment based on min.insync.replicas. Rest is as acks=1.
With the future you received, you can either check it later and continue sending messages or call the get() method - which would cause a block.
Or you can use a callback to perform the action as and when acknowledgement is received.
So TLDR;
If asks = 1 or all, does a Kafka producer need to wait for the ack response from broker and can’t do nothing?
Depends on whether you instantly use the Future.get() method - which would block.
producer.send(msg).get() //Waiting...
Or just ignore the return and delegate actions to a callback
producer.send(record,
new Callback() {
// Do the same thing
}
});
acks = 1 or all does not mean that the producer will not send the next batch until it gets the acknowledgement of the previous batch; it means that if it does not get an acknowledgement from the leader (acks = 1) or leader and all the ISRs (acks = all) - it will attempt a retry.
Even without acknowledgement of the previous batch - producers will send the next batch given you have set the configurations for it (like max.in.flight.requests.per.connection is greater than one and you implement callback on the producer side for processing the response in the asynchronous way.)

Kafka behavior with acks=0 and synchronous producer

What is the behavior of a synchronous producer when we set acks=0
producer.send(record).get();
The above statement will return immediately or we will wait for the response (RecordMetadata) ?
In other words, is it correct to say that with acks=0 and send().get() we will still wait for acknowledgment/response from the server ?
Is there some explanations in the apache Kafka documentation ?
The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
Send() returns java.util.concurrent.Future
get()
Waits if necessary for the computation to complete, and then retrieves its result.
So for your question, get() won't be ignored , but it just won't need to wait for anything to complete because the send() will return prompt immediately without waiting for acknowledgment from the brokers,
So your statement is false
As said send() is async function that returns future, future have function called get() so it will wait for answer , you can read the code: Sender and KafkaProducer in github project, if ack=0 it just return an answer back without waiting for acknowledgment from the brokers
https://github.com/apache/kafka/blob/98bd590718658f3af2d7f1ff307d1d170624f199/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L579

Kakfa Producer Message Delivery with acks=all and flush

Creating Kafka Producer with "acks=all" config.
Is their any significance of calling flush with above config ?
Will it wait for flush to be invoked before being sent to broker.
AS
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. This is equivalent
to the acks=-1 setting.
As per documentation
flush():
Invoking this method makes all buffered records immediately available
to send (even if linger_ms is greater than 0) and blocks on the
completion of the requests associated with these records. The
post-condition of flush() is that any previously sent record will have
completed (e.g. Future.is_done() == True). A request is considered
completed when either it is successfully acknowledged according to the
‘acks’ configuration for the producer, or it results in an error.
Other threads can continue sending messages while one thread is
blocked waiting for a flush call to complete; however, no guarantee is
made about the completion of messages sent after the flush call
begins.
flush() will still block the client application until all messages are sent even with ack=0. The only thing is that it won't wait for an ack, the block is only until the buffer is sent out.
flush() with ack=all guarantees that the messages have been sent and has been replicated on the cluster with required replication factor.
Finally, to answer your question: Will it wait for flush to be invoked before being sent to broker?
Answer: Not necessarily. The producer keeps sending messages at an interval or by batch size (The buffer.memory controls the total amount of memory available to the producer for buffering). But, it's always good to flush() to make sure you send all messages.
Refer to this link for more information.
Let me first try and call out the distinction between flush() and acks before I get to the 2 questions.
flush() - This is a method to be invoked in the producer to push the messages to the brokers from the buffer (configurable) maintained on the producer side. You would either invoke this method or close() to send the messages through to the brokers from the producer buffer. This gets invoked automatically if the buffer memory available to the producer gets full (as described by Manoj in his answer).
acks=ALL is however a responsibility of the broker i.e. to send an acknowledgement back to the producer after the messages have synchronously replicated to other brokers as per the setting requested in the producer. You would use this setting to tune your message delivery semantics. In this case, as soon as the messages are replicated to the designated in-sync replicas, the broker will send the acknowledgement to the producer saying - "I got your messages".
Now, on your questions i.e. If there is any significance of calling flush with the acks setting and whether or not the producer will wait for flush to be invoked before being sent to the broker.
Well, the asynchronous nature of the producer will ensure that the producer does not wait. If however, you invoke flush() explicitly or if it gets invoked on its own then any further sends will be blocked until the producer gets the acknowledgement from the broker. So, the relationship between these 2 is very subtle.
I hope this helps!

Kafka offset management

We are using Kafka 0.10... I'm seeing some conflicting information online (and in documentation) regarding how offsets are managed in kafka when enable.auto.commit is TRUE. Does the same poll() method that retrieves messages also handle the commits at the configured intervals?
If i retrieve messages from poll in a single threaded application, process the messages to completion (including handling errors) in the SAME thread, meaning poll() will not be invoked again until after my processing is complete, then I presume there is no fear in losing messages, correct? This only works if poll() attempts the commit at the subsequent invocation (if the auto.commit.interval.ms has passed, of course). If the commits are done immediately upon receiving the messages (prior to my app processing the messages), this will not work for us....
This is important, as I want to be certain we won't lose messages if we use the automatic commit policy. Duplicate messages are tolerable for us, we just have no tolerance for lost data.
Thanks for the clarification!
Does the same poll() method that retrieves messages also handle the commits at the configured intervals?
Yes. (If enable.auto.commit=true.)
If i retrieve messages from poll in a single threaded application, process the messages to completion (including handling errors) in the SAME thread, meaning poll() will not be invoked again until after my processing is complete, then I presume there is no fear in losing messages, correct?
Yes.
This only works if poll() attempts the commit at the subsequent invocation (if the auto.commit.interval.ms has passed, of course)
This is exactly how it is done.
See here for further details: http://docs.confluent.io/current/clients/consumer.html