I'm creating a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all" and trying to run some tests to see performance.
After reading some articles on kafka performance, I observed that setting replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all" would reduce the throughput compared to acks 1. Now I'm trying to understand if i increase the 'num.replica.fetchers', would it help to increase the throughput?
num.replica.fetchers is a cluster-wide configuration setting that controls how many fetcher threads are in a broker.
These threads are responsible for replicating messages from a source broker (that is, where partition leader resides).
Increasing this value results in higher I/O parallelism and fetcher throughput. Of course, there is a trade-off: brokers use more CPU and network.
default value 1
The following are some good articles on Kafka Performance.
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
https://azure.microsoft.com/en-au/blog/processing-trillions-of-events-per-day-with-apache-kafka-on-azure/
https://medium.com/#rinu.gour123/kafka-performance-tuning-ways-for-kafka-optimization-fdee5b19505b
Actually, increasing the number of fetchers COULD help increase the throughput since consumers only ever read fully acknowledged messages. But there seems to be an optimal number of fetchers to be determined which could depend on your specific scenario & other settings. Check this out (section 4): https://www.instaclustr.com/the-power-of-kafka-partitions-how-to-get-the-most-out-of-your-kafka-cluster/
Related
Recently, we faced an issue in our Kafka cluster where we overrode the max.message.bytes value for a topic (which had a replication factor of 3) to a value larger than replica.fetch.max.bytes. We did not see issues immediately but when a message (replica.fetch.max.bytes < message size < max.message.bytes) was produced later, we started seeing the below error in our logs.
Replication is failing due to a message that is greater than replica.fetch.max.bytes for partition [<topic-name>,1]. This generally occurs when the max.message.bytes has been overridden to exceed this value and a suitably large message has also been sent. To fix this problem increase replica.fetch.max.bytes in your broker config to be equal or larger than your settings for max.message.bytes, both at a broker and topic level
Since, we did not want to restart our Kafka brokers and perform a rolling upgrade to the cluster immediately, we temporarily decreased the replication factor to 1 (not high availability, I know).
So, are there any useful use cases where such settings might be useful? If yes, what? Also, are there any better solutions that one can try out to mitigate this problem, instead of stopping the replication?
My guess is that since max.message.bytes is per topic (and thus stored in ZooKeeper and updated at any point in time), and replica.fetch.max.bytes is per broker, it cannot be checked or guaranteed that a topic's max.message.bytes is <= a replica's replica.fetch.max.bytes.
I also found an old ticket regarding this very problem:
https://issues.apache.org/jira/browse/KAFKA-1844
Kaka Broker on startup checks to see if the configured replica.fetch.max.bytes >= message.max.bytes. But users can override message.max.bytes per topic and there is no such validation happening for per topic message.max.bytes. If the users configured message.max.bytes > replica.fetch.max.bytes , followers won't be able to fetch data.
Also from the documentation about replica.fetch.max.bytes, it seems that in some cases it would still work:
This is not an absolute maximum, if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that progress can be made. The maximum record batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (topic config).
So all in all, it doesn't seem to make sense and is a known issue.
Brokers allocate a buffer size of replica.fetch.max.bytes for each partition they replicate. If replica.fetch.max.bytes is set to 1 MiB, and you have 1000 partitions, about 1 GiB of RAM is required.
When the value of message.max.bytes (or max.message.bytes -topic config) is grater than the replica.fetch.max.bytes it might create situations where the batch wont fit into the allocated buffer. Hence it is important to have replica.fetch.max.bytes greater than message.max.bytes. The broker will still accept messages but fail to replicate them. Leading to potential data loss
The value of max.message.bytes is usually increased to have higher throughput. Or the size of each message is larger than 1mb(Default value)
Please ensure that the number of partitions multiplied by the size of the largest message does not exceed available memory.
As for the solution, replica.fetch.max.bytes is a read only broker level config soo a restart will be required
I'm planning to run a Kafka cluster in a production environment. Before deploying the cluster, I try to find the best configuration to ensure HA and data consistency.
I read in the official doc that it is not possible to reduce the partition replication factor, but what about the min.insync.replicas? When I decrease the value on a test environment I don't see any differences when I look at topics description. After changing the value from 3 to 2, I still have 3 ISR. Is it because it's a min value or because the configuration change is not taken into account?
Yes, it is possible to reduce (or in general change) the min.insync.replicas configuration of a topic.
However, as you were looking for the best configuration to ensure high availablity and data consistency, it would be counter-intuitive to reduce the value.
There is a big difference between ISR (in sync replicas) and the setting min.insync.replicas. The ISR shown in the kafka-topics --describe just tells you how healthy the data in your topic is and if all the replicas keep up with the partitions leader.
On the other hand, the min.insync.replicas works together with a KafkaProducer writing to the topic with that setting. It is described on the official Kafka docs on TopicConfigs as:
When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful.
To emphasize again, if you are looking for high availabilty and consistency it is best to set acks = all in your producer while at the same time keeping the min.insync.replicas high.
Our kafka setup is as follows:
30 partitions per topic
1 consumer thread
we configured this way to be able to scale-up in the future.
we wanted to minimize the times we re-balance when we need to scale-up by adding partitions because latency is very important to us and during re-balances messages can be stuck till the coordination phase is done
Having 1 consumer thread with many partitions per 1 topic can effect somehow the overall messaging consuming latency?
More partitions in a Kafka cluster leads to higher throughput however, you need to be aware that the number of partitions has an impact on availability and latency as well.
In general more partitions,
Lead to Higher Throughput
Require More Open File Handles
May Increase Unavailability
May Increase End-to-end Latency
May Require More Memory In the Client
You need to study the trade-offs and make sure that you've picked the number of partitions that satisfies your requirements regarding throughput, latency and required resources.
For further details refer to this blog post from Confluent.
My opinion: Make some tests and write down your findings. For example, try to run a single consumer over a topic with 5, 10, 15, ... partitions, measure the impact and pick the configuration that meets your requirements. Finally ask yourself if you will ever need x partitions. At the end of the day, if you need more partitions you should not worry about re-balancing etc. Kafka was designed to be scalable .
in kafka, for
replication-factor = 2
minimum ISR size = 1
unclean.leader.election.enable = false
is there a chance that(like network partition), two broker think they'are leader and both accept write, so finally some msg lost? and the producer does't even notice this.
producer use acks = all
Similar question has been answered here :How does kafka handle network partitions?
In your case, I think there is no problem when network partitioning. Since unclean.leader.election.enable is false, one of two side cannot elect new leader so only the other side can accept write.
With minimum ISR set to 1, your cluster can have only a single broker with the data at any time, so if the disk of this broker was to blow up, you risk losing data.
If you want stronger guarantees, you need to increase the minimum ISR size. For example, if you set it to 2, at any time at least 2 brokers will have all the data. So in order to lose data in this configuration, you would need to lose the disks of both brokers within the same time frame which is a lot less likely than just losing a single disk.
If you increase minimum ISR, to ease maintenance, you probably also want to bump up the number of replicas so you can have 1 broker down and still be able to produce with acks = all.
Since you have replication factor as 2. Having 1 ISR out of two is sufficient. It means that Even if the leader goes down you have 1 replica to handle the transactions. Having more replicas will lead to higher write overhead and might slow down the throughput. You can have higher number of replicas at the cost of performance for reliability.
We have a 3 host Kafka cluster. We have 136 topics, each of which has 100 partitions, with a replication factor of 3. This makes for 13,600 partitions across our cluster.
Is this a sane configuration of our topics?
It's too many. You should ask yourself if you have (or plan to have soon) enough consumer instances to need that many partitions. Then, if you do plan to have 13k consumer instances, what sort of hardware are you running these brokers on such that they would be able to serve that many consumers? That's even before your consider the additional impact of many partitions pre-1.1 https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
This to me looks like 100 was a round number and seemed future proof. I'd suggest starting at a much lower number per topic (like say 2 or 10) and see if you actually hit scale issues that demand more partitions before trying to jump to expert mode. You can always add more partitions later.
The short answer to your question is 'It depends'.
More partitions in a Kafka cluster leads to higher throughput however, you need to be aware that the number of partitions has an impact on availability and latency.
In general more partitions,
Lead to Higher Throughput
Require More Open File Handles
May Increase Unavailability
May Increase End-to-end Latency
May Require More Memory In the Client
You need to study the trade-offs and make sure that you've picked the number of partitions that satisfies your requirements regarding throughput, latency and required resources.
For further details refer to this blog post from Confluent.
Partitions = max(NP, NC)
where:
NP is the number of required producers determined by calculating: TT/TP.
NC is the number of required consumers determined by calculating: TT/TC.
TT is the total expected throughput for our system.
TP is the max throughput of a single producer to a single partition.
TC is the max throughput of a single consumer from a single partition.