Apache Kafka: TimeoutException and then nothing works‏ - apache-kafka

I am trying to send a message from a producer to a kafka node in another DC.
Both the producer and consumer are set with the default 0.10.0.0 configuration and the message sizes are not so small (around 500k).
Most of the time when sending messages I am encountered with these exceptions:
org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for topic-0
org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.
And after that no more messages get transferred (even the callback for remaining messages is not getting called).

Just wanted to chime in because I received the exact same errors today. I tried increasing the request.timeout.ms, decreasing batch.size, and even setting the batch.size to zero. However, nothing worked.
It turned out it was because the server couldn't connect to one of the 10 Kafka cluster nodes. So, what I saw were some inappropriate exceptions being thrown. By the way, we are using Kafka 0.9.0.1 if it matters.

According to Kafka documentation:
A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records.
Set batch.size = 0, it will resolve the issue.

Related

How do I usually set "batch.size" and "buffer.memory" in Apache Kafka? [duplicate]

I send String-messages to Kafka V. 0.8 with the Java Producer API.
If the message size is about 15 MB I get a MessageSizeTooLargeException.
I have tried to set message.max.bytesto 40 MB, but I still get the exception. Small messages worked without problems.
(The exception appear in the producer, I don't have a consumer in this application.)
What can I do to get rid of this exception?
My example producer config
private ProducerConfig kafkaConfig() {
Properties props = new Properties();
props.put("metadata.broker.list", BROKERS);
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
props.put("message.max.bytes", "" + 1024 * 1024 * 40);
return new ProducerConfig(props);
}
Error-Log:
4709 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 214 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
4869 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 217 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5035 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 220 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5198 [main] WARN kafka.producer.async.DefaultEventHandler - Produce request with correlation id 223 failed due to [datasift,0]: kafka.common.MessageSizeTooLargeException
5305 [main] ERROR kafka.producer.async.DefaultEventHandler - Failed to send requests for topics datasift with correlation ids in [213,224]
kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
at kafka.producer.async.DefaultEventHandler.handle(Unknown Source)
at kafka.producer.Producer.send(Unknown Source)
at kafka.javaapi.producer.Producer.send(Unknown Source)
You need to adjust three (or four) properties:
Consumer side:fetch.message.max.bytes - this will determine the largest size of a message that can be fetched by the consumer.
Broker side: replica.fetch.max.bytes - this will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated).
Broker side: message.max.bytes - this is the largest size of the message that can be received by the broker from a producer.
Broker side (per topic): max.message.bytes - this is the largest size of the message the broker will allow to be appended to the topic. This size is validated pre-compression. (Defaults to broker's message.max.bytes.)
I found out the hard way about number 2 - you don't get ANY exceptions, messages, or warnings from Kafka, so be sure to consider this when you are sending large messages.
Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
Broker: No changes, you still need to increase properties message.max.bytes and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes
The answer from #laughing_man is quite accurate. But still, I wanted to give a recommendation which I learned from Kafka expert Stephane Maarek. We actively applied this solution in our live systems.
Kafka isn’t meant to handle large messages.
Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker. You'll need to find a place to save your data, whether it can be a network drive or something else entirely, but it shouldn't be a message broker.
If you don't want to proceed with the recommended and reliable solution above,
The message max size is 1MB (the setting in your brokers is called message.max.bytes) Apache Kafka. If you really needed it badly, you could increase that size and make sure to increase the network buffers for your producers and consumers.
And if you really care about splitting your message, make sure each message split has the exact same key so that it gets pushed to the same partition, and your message content should report a “part id” so that your consumer can fully reconstruct the message.
If the message is text-based try to compress the data, which may reduce the data size, but not magically.
Again, you have to use an external system to store that data and just push an external reference to Kafka. That is a very common architecture and one you should go with and widely accepted.
Keep that in mind Kafka works best only if the messages are huge in amount but not in size.
Source: https://www.quora.com/How-do-I-send-Large-messages-80-MB-in-Kafka
The idea is to have equal size of message being sent from Kafka Producer to Kafka Broker and then received by Kafka Consumer i.e.
Kafka producer --> Kafka Broker --> Kafka Consumer
Suppose if the requirement is to send 15MB of message, then the Producer, the Broker and the Consumer, all three, needs to be in sync.
Kafka Producer sends 15 MB --> Kafka Broker Allows/Stores 15 MB --> Kafka Consumer receives 15 MB
The setting therefore should be:
a) on Broker:
message.max.bytes=15728640
replica.fetch.max.bytes=15728640
b) on Consumer:
fetch.message.max.bytes=15728640
You need to override the following properties:
Broker Configs($KAFKA_HOME/config/server.properties)
replica.fetch.max.bytes
message.max.bytes
Consumer Configs($KAFKA_HOME/config/consumer.properties)
This step didn't work for me. I add it to the consumer app and it was working fine
fetch.message.max.bytes
Restart the server.
look at this documentation for more info:
http://kafka.apache.org/08/configuration.html
I think, most of the answers here are kind of outdated or not entirely complete.
To refer on the answer of Sacha Vetter (with the update for Kafka 0.10), I'd like to provide some additional Information and links to the official documentation.
Producer Configuration:
max.request.size (Link) has to be increased for files bigger than 1 MB, otherwise they are rejected
Broker/Topic configuration:
message.max.bytes (Link) may be set, if one like to increase the message size on broker level. But, from the documentation: "This can be set per topic with the topic level max.message.bytes config."
max.message.bytes (Link) may be increased, if only one topic should be able to accept lager files. The broker configuration must not be changed.
I'd always prefer a topic-restricted configuration, due to the fact, that I can configure the topic by myself as a client for the Kafka cluster (e.g. with the admin client). I may not have any influence on the broker configuration itself.
In the answers from above, some more configurations are mentioned as necessary:
replica.fetch.max.bytes (Link) (Broker config)
From the documentation: "This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made."
max.partition.fetch.bytes (Link) (Consumer config)
From the documentation: "Records are fetched in batches by the consumer. If the first record batch in the first non-empty partition of the fetch is larger than this limit, the batch will still be returned to ensure that the consumer can make progress."
fetch.max.bytes (Link) (Consumer config; not mentioned above, but same category)
From the documentation: "Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress."
Conclusion: The configurations regarding fetching messages are not necessary to change for processing messages, lager than the default values of these configuration (had this tested in a small setup). Probably, the consumer may always get batches of size 1. However, two of the configurations from the first block has to be set, as mentioned in the answers before.
This clarification should not tell anything about performance and should not be a recommendation to set or not to set these configuration. The best values has to be evaluated individually depending on the concrete planned throughput and data structure.
One key thing to remember that message.max.bytes attribute must be in sync with the consumer's fetch.message.max.bytes property. the fetch size must be at least as large as the maximum message size otherwise there could be situation where producers can send messages larger than the consumer can consume/fetch. It might worth taking a look at it.
Which version of Kafka you are using? Also provide some more details trace that you are getting. is there some thing like ... payload size of xxxx larger
than 1000000 coming up in the log?
For people using landoop kafka:
You can pass the config values in the environment variables like:
docker run -d --rm -p 2181:2181 -p 3030:3030 -p 8081-8083:8081-8083 -p 9581-9585:9581-9585 -p 9092:9092
-e KAFKA_TOPIC_MAX_MESSAGE_BYTES=15728640 -e KAFKA_REPLICA_FETCH_MAX_BYTES=15728640 landoop/fast-data-dev:latest `
This sets topic.max.message.bytes and replica.fetch.max.bytes on the broker.
And if you're using rdkafka then pass the message.max.bytes in the producer config like:
const producer = new Kafka.Producer({
'metadata.broker.list': 'localhost:9092',
'message.max.bytes': '15728640',
'dr_cb': true
});
Similarly, for the consumer,
const kafkaConf = {
"group.id": "librd-test",
"fetch.message.max.bytes":"15728640",
... .. }
Here is how I achieved successfully sending data up to 100mb using kafka-python==2.0.2:
Broker:
consumer = KafkaConsumer(
...
max_partition_fetch_bytes=max_bytes,
fetch_max_bytes=max_bytes,
)
Producer (See final solution at the end):
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
)
Then:
producer.send(topic, value=data).get()
After sending data like this, the following exception appeared:
MessageSizeTooLargeError: The message is n bytes when serialized which is larger than the total memory buffer you have configured with the buffer_memory configuration.
Finally I increased buffer_memory (default 32mb) to receive the message on the other end.
producer = KafkaProducer(
...
max_request_size=KafkaSettings.MAX_BYTES,
buffer_memory=KafkaSettings.MAX_BYTES * 3,
)

Kafka org.apache.kafka.common.errors.TimeoutException for some of the records

Getting below error message when producing record to Azure event hub(kafka enabled)
Expiring 14 record(s) for eventhubname: 30125 ms has passed since batch creation plus linger time
Stack used azure eventhub , Spring kafka
below config present in Kafka producer config
props.put(ProducerConfig.RETRIES_CONFIG, "3");
Would like to know if kafka producer will be retried 3 times incase of above error message
ProducerConfig.RETRIES_CONFIG -> This configuration has no use.
Default retry is set as Integer.MAX_VALUE = 2147483647.
Producer automatically retry in case of failure.
Kindly check following configuration and tune accordingly.
linger.ms=0 ( try with zero, this will send batch request as soon as possible, Non zero value require more tuning along with other parameters)
buffer.memory -> Max memory used by Producer in buffer, try increasing this.
max.block.ms -> check this value, This is probable cause of timeoutException. Increase this as per scenario.

Kafka Timeout Exception:Batch Expired

We are facing following issue on production:
org.apache.kafka.common.errors.TimeoutException: Batch Expired.
Is it because of invalid configuration like batch size, request timeout or any other thing?
The error indicates that some records are put into the queue at a faster rate than they can be sent from the client.
When your Producer sends messages, they are stored in buffer (before sending the to the target broker) and the records are grouped together into batches in order to increase throughput. When a new record is added to the batch, it must be sent within a -configurable- time window which is controlled by request.timeout.ms (the default is set to 30 seconds). If the batch is in the queue for longer time, a TimeoutException is thrown and the batch records will then be removed from the queue and won't be delivered to the broker.
Increasing the value of request.timeout.ms should do the trick for you.

Apache-Kafka, batch.size vs buffer.memory

I'm trying to figure out the difference between the settings batch.size and buffer.memory in Kafka Producer.
As I understand batch.size: It's the max size of the batch that can be sent.
The documentation describes buffer.memory as: the bytes of memory the Producer can use to buffer records waiting to be sent.
I don't understand the difference between these two. Can someone explain?
Thanks
In my opinion,
batch.size: The maximum amount of data that can be sent in a single request. If batch.size is (32*1024) that means 32 KB can be sent out in a single request.
buffer.memory: if Kafka Producer is not able to send messages(batches) to Kafka broker (Say broker is down). It starts accumulating the message batches in the buffer memory (default 32 MB). Once the buffer is full, It will wait for "max.block.ms" (default 60,000ms) so that buffer can be cleared out. Then it's throw exception.
Kafka Producer and Kafka Consumer have many configuration that helps in performance tuning like gaining low latency and High throughput. buffer.memory and batch.size is also one of those and these are specific for Kafka Producer. Let see more details on these configuration.
buffer.memory
This sets the amount of memory the producer will use to buffer messages waiting to be sent to broker. If messages are sent by the application faster than they can be delivered to server, producer may run out of space and additional send() call will either be block or throw exception based on the max.block.ms configuration which allow blocking for a certain time and then throw exception. Another case may be if all broker server are down due to any reason and kafka producer will not able to send messages to broker and producer have to keep these messages in the memory allocated based on buffer.memory configuration but this will be filled up soon if broker not back normal state then as mentioned above mx.block.ms time will be considered to free up the space.
Default value for max.block.ms is 60,000 ms
Default value for buffer.memory is 32 MB (33554432)
batch.size
When multiple records are sent to the same partition, the producer will put them in batch. This configuration controls the amount of memory in bytes (not messages) that will
be used for each batch. When the batch is full, all the messages in the batch will be sent. However this does not means that the producer will wait for batch to become full. The producer will send half full batches and even the batch with just a single message in them. Therefore setting the batch size too large will not cause delays in sending the messages. it will just use memory for the batches. Setting the batch size too small will add extra overhead because producer will need to send messages more frequently.
Default batch size is 16384.
batch.size is also work based on the linger.ms which controls the amount of time to wait for additional messages before sending current batch. As we know that Kafka producer sends a batch of messages either when rge current batch is full or when the linger.ms time is reached. By default prodcuer will send messages as soon as there is a sender thread available to send them even if there is just message in the bacth.
Both of these producer configurations are described on the Confluent documentation page as following:
batch.size
Kafka producers attempt to collect sent messages into batches to improve throughput. With the Java client, you can use batch.size to control the maximum size in bytes of each message batch.
buffer.memory
Use buffer.memory to limit the total memory that is available to the Java client for collecting unsent messages. When this limit is hit, the producer will block on additional sends for as long as max.block.ms before raising an exception.

kafka "stops working" after a large message is enqueued

I'm running kafka_2.11-0.9.0.0 and a java-based producer/consumer. With messages ~70 KB everything works fine. However, after the producer enqueues a larger, 70 MB message, kafka appears to stop delivering the messages to the consumer. I.e. not only is the large message not delivered but also subsequent smaller messages. I know the producer succeeds because I used kafka callback for the confirmation and I can see the messages in the kafka message log.
kafka config custom changes:
message.max.bytes=200000000
replica.fetch.max.bytes=200000000
consumer config:
props.put("fetch.message.max.bytes", "200000000");
props.put("max.partition.fetch.bytes", "200000000");
You need to increase the size of the messages the consumer can consume so it doesn't get stuck trying to read a message that is to big.
max.partition.fetch.bytes (default value is 1048576 bytes)
The maximum amount of data per-partition the server will return. The
maximum total memory used for a request will be #partitions *
max.partition.fetch.bytes. This size must be at least as large as the
maximum message size the server allows or else it is possible for the
producer to send messages larger than the consumer can fetch. If that
happens, the consumer can get stuck trying to fetch a large message on
a certain partition.
What helped was upping the java heap size.