Can KafkaTemplate lose messages in a message-loop? - apache-kafka

Imagine we need to send a sequence of the messages to Kafka using Spring Boot KafkaTemplete.  We use a message loop like this:
list.forEach(value -> kafkaTemplate.send(topic, value));
Kafka producer is configured with acks=1 to ensure every message is received by the broker (let it be for the sake of simplicity that there is only one broker).
Imagine that on the middle of the loop, starting from nth message, for some reason the broker start failing to receive messages (stop sending success acknowledgements to the producer). Later, via the k iteration over the loop when sending n+kth message, the broker gets healthy and is able again to receive messages.
How can KafkaTemplete handle such a situation? Will Kafka producer remember that n to n+k-1 messages were not sent successfully and will buffer them in the memory, and somehow will implicitly try to send them before sending n+k messages? Or the unsent messages will be lost?

Related

Can sending message to Kafka asynchronously effect message ordering?

This is my understanding of sending messages to Kafka asynchronously and implementing a callback function is:
Scenario: Producer is going to receive 4 batches of messages to send (to same partition, for the sake of simplicity).
Producer sends batch A.
Producer sends batch B.
Producers receives reply from broker and implements call back - batch A was unsuccessful and retriable, batch B was successful, so producer sends batch A again.
Won't this disturb the message ordering as now A is received by Kafka after B?
If you need message ordering within partition you can use idempotent producer:
enable.idempotence=true
akcs=all
max.in.flight.requests.per.connection<=5
retries>0
This will resolve potential duplicates from producer and maintain the ordering.
If you don't want to use idempotent producer then it is enough to set max.in.flight.requests.per.connection=1. This is the number of unacknowledged batches on the producer side. It means that batch B will not be sent before acknowledge for A is received.

Apache Pulsar vs Kafka - do consumers pull (poll) messages off the topics?

I know that in Kafka, the consumer pulls messages off the broker topics (pull) ?
I get the feeling that Pulsar works the same way, considering that the receive method blocks. But I can't find a confirmation. Can someone point me to a reference or correct me ?
Thanks
Pulsar's Documentation clearly explains how message consumption works:
The Pulsar Consumer origin reads messages from one or more topics in
an Apache Pulsar cluster.
The Pulsar Consumer origin subscribes to Pulsar topics, processes
incoming messages, and then sends acknowledgements back to Pulsar as
the messages are read.
Messages can be received from brokers either synchronously (sync) or asynchronously (async).
receive method receives messages synchronously. The consumer process will be blocked until a message becomes available. For example,
Message msg = consumer.receive();
An asynchronous receive will return immediately with a value of type CompletableFuture that completes once a new message is available. For example,
CompletableFuture<Message> asyncMessage = consumer.receiveAsync();
In Pulsar document:
There is a queue at the consumer side to receive messages pushed from the broker. You can configure the queue size with the receiverQueueSize parameter. The default size is 1000). Each time consumer.receive() is called, a message is dequeued from the buffer.
So broker pushes messages to a queue on consumer side. When the receive method is invoked, a message will be dequeued and returned.
Pulsar consumer will regularly send permit request to Pulsar broker to ask for more message when the half of the queue is consumed. This is described here.
In short, as described here
Pulsar also uses a push-based approach but with an API that simulates consumer pulls.

When message process fails, can consumer put back message to same topic?

Suppose one of my program consuming message from kafka topic. During processing of message, consumer access some db. Its db acccess fails due to xyz reason. But we dont have to abandon the message. We need to park the message for later processing. In JMS when message processing fails application container put back the message to the queue. It does not lost. In Kafka once it received its offset increases and next message comes. How to handle this ?
There are two approaches to achieve this.
Set the Kafka Acknowledge mode to manual and in case of error terminate the consumer thread without submitting the offset (If group management is enabled new consumer will be added after triggering re balancing and poll the same batch)
Second approach is simple, just have one error topic and publish messages to error topic in case of any error, so later you can consumer them or keep track of them.

What does Broker do when multiple Producers write the same message to Broker? And one more related question (due to 150 character limit)

Producer A writes a message to Broker A (partition 1), and Producer B writes the same message to Broker A(partition 1). What happens to the message or the Broker?
I'm guessing since there is the same message, Producer B stops the same message, and continue with the next part of the message?
I have one more question If you want to send a movie file ,for example, to Kafka Cluster, can I make 4 producers and have them send different parts of that movie into the cluster? For instance, Producer A sends the first part of the movie, Producer B sends the second part of the movie, and so on.
(Because that seems to work more efficiently than one producer.)
If 2 producers send the same message to Kafka, the message is written twice. Kafka does not check the content of messages. It's the same if you were to call send() twice in a producer, you get 2 messages in Kafka.
If you want to send large amounts of data, it's recommended to use multiple producers to split the work. Also Kafka is not really designed to handle >1GB messages so splitting large files into smaller chunks is a good idea. Just be careful how you split your data because you'll may have to reassemble it on the consumer side!

Which Kafka producer property to use when I no longer want to receive data after certain threshold?

Which Kafka producer property to use to achieve the following?
I am using UDP Kafka Bridge which sends the message from UDP port to Kafka topic. If the memory on the Kafka producer exceeds certain MB (Say 300 MB) drop all messages (With no retry) and again resume receiving messages when the producer memory goes down.
Basically, I am trying to save my server from crashing if Kafka broker is not able to take any messages.
Basically, I am trying to save my server from crashing if Kafka is not able to take any messages.
As long as you don't call get on the Future returned from the send() call to the producer, you are fine. The Producer API works in the async mode.