Increase the number of messages read by a Kafka consumer in a single poll - apache-kafka

Kafka consumer has a configuration max.poll.records which controls the maximum number of records returned in a single call to poll() and its default value is 500. I have set it to a very high number so that I can get all the messages in a single poll.
However, the poll returns only a few thousand messages(roughly 6000) in a single call even though the topic has many more. How can I further increase the number of messages read by a single consumer?

You can increase Consumer poll() batch size by increasing max.partition.fetch.bytes, but still as per documentation it has limitation with fetch.max.bytes which also need to be increased with required batch size. And also from the documentation there is one other property message.max.bytes in Topic config and Broker config to restrict the batch size. so one way is to increase all of these property based on your required batch size
In Consumer config max.partition.fetch.bytes default value is 1048576
The maximum amount of data per-partition the server will return. Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress. The maximum record batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (topic config). See fetch.max.bytes for limiting the consumer request size
In Consumer Config fetch.max.bytes default value is 52428800
The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress. As such, this is not a absolute maximum. The maximum record batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (topic config). Note that the consumer performs multiple fetches in parallel.
In Broker config message.max.bytes default value is 1000012
The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large.
In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case.
This can be set per topic with the topic level max.message.bytes config.
In Topic config max.message.bytes default value is 1000012
The largest record batch size allowed by Kafka. If this is increased and there are consumers older than 0.10.2, the consumers' fetch size must also be increased so that the they can fetch record batches this large.
In the latest message format version, records are always grouped into batches for efficiency. In previous message format versions, uncompressed records are not grouped into batches and this limit only applies to a single record in that case.

Most probably your payload is limited by max.partition.fetch.bytes, which is 1MB by default. Refer to Kafka Consumer configuration.
Here's good detailed explanation:
MAX.PARTITION.FETCH.BYTES
This property controls the maximum number of bytes the server will return per partition. The default is 1 MB, which means that when KafkaConsumer.poll() returns ConsumerRecords, the record object will use at most max.partition.fetch.bytes per partition assigned to the consumer. So if a topic has 20 partitions, and you have 5 consumers, each consumer will need to have 4 MB of memory available for ConsumerRecords. In practice, you will want to allocate more memory as each consumer will need to handle more partitions if other consumers in the group fail. max. partition.fetch.bytes must be larger than the largest message a broker will accept (determined by the max.message.size property in the broker configuration), or the broker may have messages that the consumer will be unable to consume, in which case the consumer will hang trying to read them. Another important consideration when setting max.partition.fetch.bytes is the amount of time it takes the consumer to process data. As you recall, the consumer must call poll() frequently enough to avoid session timeout and subsequent rebalance. If the amount of data a single poll() returns is very large, it may take the consumer longer to process, which means it will not get to the next iteration of the poll loop in time to avoid a session timeout. If this occurs, the two options are either to lower max. partition.fetch.bytes or to increase the session timeout.
Hope it helps!

Related

How does one Kafka consumer read from more than one partition?

I would like to know how one consumer consumes from more than one partition, specifically, in what order are messages read from the different partitions?
I had a peek at the source code (Consumer, Fetcher) but I can't really follow all of it.
This is what I thought would happen:
Partitions are read sequentially. That is: all the messages in one partition will be read before continuing to the next. If we reach max.poll.records without consuming the whole partition, the next fetch will continue reading the current partition until it is exhausted, before going on to the next.
I tried setting max.poll.records to a relatively low number and seeing what happens.
If I send messages to a topic and then start a consumer, all the messages are read from one partition before continuing to the next, even if the number of messages in that partition is higher than max.poll.records.
Then I tried to see if I could "lock" the consumer in one partition, by sending messages to that partition continuously (using JMeter). But I couldn't do it: messages from other partitions were also being read.
The consumer is polling for messages from its assigned partitions in a greedy round-robin way.
e.g. if max.poll.records is set to 100, and there are 2 partitions assigned A,B. The consumer will try to poll 100 from A. If partition A hasn't had 100 available messages, it will poll whats left to complete to 100 messages from partition B.
Although this is not ideal, this way no partition will be starved.
This is also explain why ordering is not guaranteed between partitions.
I have read the KIP mentioned in the answer to the question linked in the comments and I think I finally understood how the consumer works.
There are two main configuration options that affect how data is consumed:
max.partition.fetch.bytes: the maximum amount of data that the server will return for a given partition
max.poll.records: the maximum amount of records that are returned each time the consumer polls
The process of fetching from each partition is greedy and proceeds in a round-robin way. Greedy means that as many records as possible will be retrieved from each partition; if all records in a partition occupy less than max.partition.fetch.bytes, then all of them will be fetched; otherwise, only max.partition.fetch.bytes will be fetched.
Now, not all the fetched records will be returned in a poll call. Only max.poll.records will be returned.
The remaining records will be retained for the next call to poll.
Moreover, if the number of retained records is less than max.poll.records, the poll method will start a new round of fetching (pre-fetching) before returning. This means that, usually, the consumer is processing records while new records are being fetched.
If some partitions receive considerably more messages than others, this could lead to the less active partitions not being processed for long periods of time.
The only downside to this approach is that it could lead to some partitions going unconsumed for an extended amount of time when there is a large imbalance between the partition's respective message rates. For example, suppose that a consumer with max messages set to 1 fetches data from partitions A and B. If the returned fetch includes 1000 records from A and no records from B, the consumer will have to process all 1000 available records from A before fetching on partition B again.
In order to prevent this, we could reduce max.partition.fetch.bytes.

Kafka consumer, which takes effect, max.poll.records or max.partition.fetch.bytes?

I have trouble understand these two consumer settings, let's say I have a topic with 20 partitions, and all messages are 1KB size, and, to simplify this discussion, also suppose that I have only one consumer.
If I set max.partition.fetch.bytes=1024, then, since each partition will give me one message, I will get 20 messages in one go if I understanding it correctly.
But what if I set max.poll.records=10? Could anyone help to provide some explanation? Many thanks.
There is no direct relation between two configs. max.partition.fetch.bytes specifies max amount data per partition the server will return. On the other hand max.poll.records is specifies maximum number of records returned in a single poll().
So in your case if you have a topic with 20 partitions that each of them has one record with 1KB size and there is only one consumer which subscribe this topic,
then you can get each of the messages beacuse message size doesn't exceed max.partition.fetch.bytes but you can also get at most 10 messages in one poll.
As a result;
Your consumer will get 10 messages in the first call to poll().
In your second poll() it will get other 10 records.

Kafka fetch max bytes doesn't work as expected

I have a topic worth 1 GB of messages. A. Kafka consumer decides to consume these messages. What could I do to prohibit the consumer from consuming all messages at once? I tried to set the
fetch.max.bytes on the broker
to 30 MB to allow only 30 MB of messages in each poll. The broker doesn't seem to honor that and tries to give all messages at once to the consumer causing Consumer out of memory error. How can I resolve this issue?
Kafka configurations can be quite overwhelming. Typically in Kafka, multiple configurations can work together to achieve a result. This brings flexibility, but flexibility comes with a price.
From the documentation of fetch.max.bytes:
Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress.
Only on the consumer side, there are more configurations to consider for bounding the consumer memory usage, including:
max.poll.records: limits the number of records retrieved in a single call to poll. Default is 500.
max.partition.fetch.bytes: limits the number of bytes fetched per partition. This should not be a problem as the default is 1MB.
As per the information in KIP-81, the memory usage in practice should be something like min(num brokers * max.fetch.bytes, max.partition.fetch.bytes * num_partitions).
Also, in the same KIP:
The consumer (Fetcher) delays decompression until the records are returned to the user, but because of max.poll.records, it may end up holding onto the decompressed data from a single partition for a few iterations.
I'd suggest you to also tune these parameters and hopefully this will get you into the desired state.

Kafka config replica.fetch.max.bytes on a per-topic level

I would like to set a Kafka cluster to only allow large messages on a particular topic. From the docs I see that if I wanted to do this at the level of the entire cluster I could do so by setting message.max.bytes to allow a larger amount of data on the broker and replica.fetch.max.bytes to allow it to be replicated, but my understanding is that this would increase memory usage for all topics in my cluster, not just the one that I know can receive large messages. There is also a topic-level setting max.message.bytes that controls the maximum size of messages, but I don't see a topic-level setting controlling the maximum data size of replication operations. It seems strange that one of these closely tied settings is not configurable at a topic level; perhaps I'm missing where such setting is or there is another way to accomplish these goals?
replica.fetch.max.bytes can only be set on the broker level. However, you can set max.partition.fetch.bytes on the consumer side:
The maximum amount of data per-partition the server will return.
Records are fetched in batches by the consumer. If the first record
batch in the first non-empty partition of the fetch is larger than
this limit, the batch will still be returned to ensure that the
consumer can make progress. The maximum record batch size accepted by
the broker is defined via message.max.bytes (broker config) or
max.message.bytes (topic config). See fetch.max.bytes for limiting the
consumer request size.
Note that this is a per-partition configuration, meaning that if you set it to a large number, it will consume a lot of memory in case you have a lot of partitions too.

Is this possible? producer batch.size * max.request.size > broker max.message.bytes

Average message size is small, but size is vary.
Average message size: 1KBytes
1MBytes message incomes in arbitrary rate. / So, producer's max.request.size = 1MBytes
broker's max.message.bytes = 2MBytes
My questions.
To avoid producing size error, user have to set batch.size LTE 2?
Or producer library decides batch size automatically to avoid error? (even user set large batch.size)
Thanks.
Below are the definition of the related configs in question
Producer config
batch.size : producer will attempt to batch records until it reaches batch.size before it is sent to kafka ( assuming batch.size is configured to take precedence over linger.ms ) .Default - 16384 bytes
max.request.size : The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Default - 1048576 bytes
Broker config
message.max.bytes : The largest record batch size allowed by Kafka. Default - 1000012 bytes
replica.fetch.max.bytes : This will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly.
To answer your questions
To avoid producer send errors , you don't need to set batch size 2MB as this will delay the transmission of your low size messages . You can keep the batch.size according to the avg message size and depending on how much you want to batch
If you don't specify batch size , it would take the default value which is
16384 bytes
So basically you will have to configure producer 'max.request.size'>=2MB and broker 'message.max.bytes' and 'replica.fetch.max.bytes' >=2MB.
This query arises because there are various settings available around batching. Let me attempt to make them clear:
Kafka Setting: message.max.bytes and fetch.max.bytes
The Kafka broker limits the maximum size (total size of messages in a batch, if messages are published in batches) of a message that can be produced, configured by the cluster-wide property message.max.bytes (defaults to 1 MB). A producer that tries to send a message larger than this will receive an error back from the broker, and the message will not be accepted. As with all byte sizes specified on the broker, this configuration deals with compressed message size, which means that producers can send messages that are much larger than this value uncompressed, provided they compress it under the configured message.max.bytes size.
Note: This setting can be overridden by a specific topic (but with name max.message.bytes).
The maximum message size, message.max.bytes, configured on the Kafka broker must be coordinated with the cluster-wide property fetch.max.bytes (defaults to 1 MB) on consumer clients. It configures the maximum number of bytes of messages to attempt to fetch for a request. If this value is smaller than message.max.bytes, then consumers that encounter larger messages will fail to fetch those messages, resulting in a situation where the consumer gets stuck and cannot proceed.
The configuration setting replica.fetch.max.bytes (defaults to 1MB) determines the rough amount of memory you will need for each partition on a broker.
Producer Setting: max.request.size
This setting controls the size of a produce request sent by the producer. It caps both the size of the largest message that can be sent and the number of messages that the producer can send in one request. For example, with a default maximum request size of 1 MB, the largest message you can send is 1MB or the producer can batch 1000 messages of size 1k each into one request.
In addition, the broker has its own limit on the size of the largest message it will accept message.max.bytes). It is usually a good idea to have these configurations match, so the producer will not attempt to send messages of a size that will be rejected by the broker.
Note that message.max.bytes (broker level) and max.requrest.size (producer level) puts a cap on the maximum size of request in a batch, but batch.size (which should be lower than previous two) and linger.ms are the settings which actually govern the size of the batch.
Producer Setting: batch.size and linger.ms
When multiple records are sent to the same partition, the producer will batch them together. The parameter batch.size controls the maximum amount of memory in bytes (not the number of messages!) that will be used for each batch. If a batch has become full, all the message in the batch has to be sent. This helps in throughput on both the client and the server.
A small batch size will make batching less common and may reduce throughput. A very large size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional messages.
The linger.ms (defaults to 0) setting controls the amount of time to wait for additional messages before sending the current batch.
By default, the producer will send messages as soon as there is a sender thread available to send them, even if there's just one message in the batch (note that batch.size only specifies the maximum limit on the size of a batch). By setting linger.ms higher than 0, we instruct the producer to wait a few milliseconds to add additional messages to the batch before sending it to the brokers, even if a sender thread is available. This increases latency but also increases throughput (because we send more messages at once, there is less overhead per message).