Kafka writing in disk logs time - apache-kafka

Running some perf test on Kafka, we are having bad latency in the producer.
Checking the kafka broker logs I can see this log
[Topic] Wrote producer snapshot at offset 331258 with 62 producer ids in 860 ms. (kafka.utils.Logging)
I dont know if this is the time that it takes to write in disk or replicas before ack to the producer(ack=all) but those 800ms it seems a lot to me.
Regards

This needs a detailed analysis. Here are few things that you can do:
Monitor kafka broker/producer resources like CPU/Memory to see if any particular resource is reaching near 100% usage.
If kafka broker resource is saturating, then for your load you might need to provide more resources to your kafka broker. Same logic is applicable on your kafka producer.
If resources are not saturating, you would need to play around with your kafka producer configuration. Calculate rough throughput of your kafka producers (in messages/sec as well as bytes/sec). Check kafka producer config defaults to see if you can find a probable cause. There are a lot of producer configs like: batch.size, buffer.memory, linger.ms, max.request.size etc any of which might be the reason for your producer to not perform in an optimum way.

Related

Apache Kafka: how to configure message buffering properly

I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.

Handle kafka broker full disk space

We have setup a zookeeper quorum (3 nodes) and 3 kafka brokers. The producers can't able to send record to kafka --- data loss. During investigation, we (can still) SSH to that broker and observed that the broker disk is full. We deleted topic logs to clear some disk space and the broker function as expected again.
Given that we can still SSH to that broker, (we can't see the logs right now) but I assume that zookeeper can hear the heartbeat of that broker and didn't consider it down? What is the best practice to handle such events?
The best practice is to avoid this from happening!
You need to monitor the disk usage of your brokers and have alerts in advance in case available disk space runs low.
You need to put retention limits on your topics to ensure data is deleted regularly.
You can also use Topic Policies (see create.topic.policy.class.name) to control how much retention time/size is allowed when creating/updating topics to ensure topics can't fill your disk.
The recovery steps you did are ok but you really don't want to fill the disks to keep your cluster availability high.

Why Apache Kafka doesn't support ephemeral topics?

Having a distributed log is great but I see good use cases for ephemeral topics as well so I wonder Why Apache Kafka doesn't support ephemeral topics?
Actually, Kafka can save messages in memory and after a specific time or number of messages flushes them to the disk.
see log.flush.interval.messages and log.flush.interval.ms in Broker configs.

Increase number of partitions in a Kafka topic from a Kafka client

I'm a new user of Apache Kafka and I'm still getting to know the internals.
In my use case, I need to increase the number of partitions of a topic dynamically from the Kafka Producer client.
I found other similar questions regarding increasing the partition size, but they utilize the zookeeper configuration. But my kafkaProducer has only the Kafka broker config, but not the zookeeper config.
Is there any way I can increase the number of partitions of a topic from the Producer side? I'm running Kafka version 0.10.0.0.
As of Kafka 0.10.0.1 (latest release): As Manav said it is not possible to increase the number of partitions from the Producer client.
Looking ahead (next releases): In an upcoming version of Kafka, clients will be able to perform some topic management actions, as outlined in KIP-4. A lot of the KIP-4 functionality is already completed and available in Kafka's trunk; the code in trunk as of today allows client to create and to delete topics. But unfortunately, for your use case, increasing the number of partitions is still not possible yet -- this is in scope for KIP-4 (see Alter Topics Request) but is not completed yet.
TL;DR: The next versions of Kafka will allow you to increase the number of partitions of a Kafka topic, but this functionality is not yet available.
It is not possible to increase the number of partitions from the Producer client.
Any specific use case use why you cannot use the broker to achieve this ?
But my kafkaProducer has only the Kafka broker config, but not the
zookeeper config.
I don't think any client will let you change the broker config. You can only access (read) the server side config at max.
Your producer can provide different keys for ProducerRecord's. The broker would place them in different partitions. For example, if you need two partitions, use keys "abc" and "xyz".
This can be done in version 0.9 as well.

Kafka producer is not able to update metadata after some time

I have an kafka environment which has 3 brokers and 1 zookeeper. I had pushed around >20K message in my topic. Apache Storm is computing the data in topic which is added by producer.
After few hours passed, While I am trying to produce messages to kafka, its showing the following exception
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
After restarting the kafka servers its working fine.
but on production i can't restart my server everytime.
so can any one help me out to figure out my issue.
my kafka configuration are as follows :
prodProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"list of broker");
prodProperties.put(ProducerConfig.ACKS_CONFIG, "1");
prodProperties.put(ProducerConfig.RETRIES_CONFIG, "3");
prodProperties.put(ProducerConfig.LINGER_MS_CONFIG, 5);
Although Kafka producer tuning is a quite hard topic, I can imagine that your producer is trying to generate records faster than it can transfer to your Kafka cluster.
There is a producer setting buffer.memory which defines how much memory producer can use before blocking. Default value is 33554432 (33 MB).
If you increase the producer memory, you will avoid blocking. Try different values, like 100MB.