Difference between kafka batch and kafka request - apache-kafka

I was not able to find an satisfactory answer anywhere, sorry for if this question might look trivial:
In Kafka, on producer side, can a request contain multiple batches to different partitions ?
I see the words batch and requests are used as synonyms in the documentation, and I was hoping to find some clarity on this.
If yes, how does this affect the ack policy ?
Are acks on per batch or request basis ?

A Kafka request (and response) is a message sent over the network between a Kafka client and broker. The Kafka protocol uses many types of requests, you can find them all in the Kafka protocol documentation.
The Produce and Fetch requests are used to exchange records. They both contain Kafka batches, it's the RECORDS field in the protocol description. A Kafka batch is used to group several records together and saves some bytes by sharing the metadata for all records. You can find the exact format of a batch in the documentation.
TLDR:
Requests/responses are the full messages exchanged between Kafka clients and brokers. Some requests contain Kafka batches that are groups of records.

I'm not sure you are asking about producer or consumer side. Here are some info that might answer your question.
On producer side:
By default, Kafka producer will accumulate records in a batch up to 16KB.
By default, the producer will have up to 5 requests in flight, meaning that 5 batches can be sent to Kafka at the same time. Meanwhile, the producer start to accumulate data for the next batches.
The acks config controls the number of brokers required to answer in order to consider each request successful.
On consumer side:
By default, the Kafka consumer regularly calls poll() to get a maximum of 500 records per poll.
Also by default, the Kafka consumer will ack every 5 seconds.
Meaning that the consumer will commit all the records that have been polled during the last 5 seconds by all the subsequent calls to poll().
Hope this helps!

Related

Kafka set maximum records consumed in a minute

I'm creating a scraper. The producer sends data to Kafka's topic with the information about links to be scraped. The consumer is an AWS Lambda function that will be triggered when a message is received on that topic.
To avoid blocking, I want to add a cap on the maximum number of messages consumed in a given time. For example, I just want to consume only 5 messages in a minute. While the producer should keep pushing the messages to Kafka.
How can I achieve this?

Metadata requests in Kafka producer

How many metadata requests will Kafka producer make? one per message or one per batch or one per partition?
How the acknowledgements will be sent to produce by Kafka ? one at a time or as a whole list or list per each leader?
The first time the producer makes a metadata request is when it connects to the bootstrap servers that you set in the client configuration. Of course, it can be just one broker or more but not necessarily all the brokers in the cluster (so the metadata request is not for each broker). In this way, the producer gets information about where are the topics that it wants to send messages.
During its life, more metadata requests can be done when it receives an error connecting to the broker leader for the partition it's writing, in this case, it needs to know which broker is the new leader for connecting to it (if not connected yet for other topics) and starting to send.
How many metadata requests will Kafka producer make? one per message or one per batch or one per partition?
Usually one per broker, to find out the partition leaders.
Could be more if the whole send process takes a lot of time and your metadata on producer side expires (the property is called metadata.timeout.ms or so).
How the acknowledgements will be sent to produce by Kafka ? one at a time or as a whole list or list per each leader?
Produce requests are sent only to leaders.
As they contain batches of records, you will get a ProduceResponse per batch.

Apache Kafka- Algorithm/Strategy used to pull messages from different partitions of a same topic by a Single Consumer

I have been studying Apache Kafka for a while now.
Lets consider the following example.
Consider I have a topic with 3 partitions. I have a single producer and single consumer. I am producing my messages without specifying the key attribute.
So i know on the producer side, when i publish a message, the strategy used by kafka to assign a message to either of those partitions would be Round-Robin.
Now, what i want to know is when I start a single consumer belonging to a certain consumer group listening to that same topic, what strategy will it use to pull the messages from the different partitons(as there are 3)?
Would it follow the a similar round-robin model, where it will send a fetch request to a leader of a partition 1, wait for a response, get the response, return the records to process. Then, send a fetch request to the leader of a partition 2 and so on?
If it follows some other strategy/algorithm, I would love to know what it is?
Thank you in advance.
There is no ordering guarantee outside of a partition so in a way that algorithm used is moot to the end user and subject to change.
Today, there is nothing terribly complex that happens in this instance. The protocol shows you that a fetch request includes a partition so you get a fetch per partition. That means the order depends on the consumer. A partition won't be starved because fetch requests will happen for all partitions assigned to the consumer.

How to maintain ordering of message in Kafka active active site

I have a business requirement of maintaining messages in active active site, i am planning to use kafka for the same.
The producer puts messages into JMS/MQ, which will be consumed by KAFKA.
So when a batch message of 1 million messages are placed in MQ/JMS by producer, Is it possible to maintain the sequence of message in geographically distributed active-active kafka cluster?
(assuming we are having one partition and one consumer per topic)
Thanks in advance
Yes, the order of messages per partition of a topic is preserved. Between different topics there are no guarantees. So if your entire batch is sent to the same single-partition topic by one producer, yes the order will be preserved. There are some nuances of the configuration that you should be aware of, for instance the ordering guarantee will not hold if max inflight requests per connection > 1 and retries are enabled. The defaults, however, are safe. For more details look for "max.in.flight.requests.per.connection" in https://kafka.apache.org/documentation/#configuration
If your setup has redundant producers with failover, then you may want to consider enabling idempotence.

concept of record vs request vs batch in kafka

I've seen these terms used interchangeably while reading parameters of producer API. So, do these terms refer to same thing or is there any conceptual difference?
A record is the data or message to be sent.
A batch is a fixed number of records to be sent together.
A request is sending multiple batches to the broker so that the broker writes them to the topic.