Max size production kafka cluster deployment for now - apache-kafka

I am considering how to deploy our kafka cluster: a big cluster with several broker groups or several clusters. If a big cluster, I want to know how big a kafka cluster can be. kafka has a controller node and I don't know how many brokers it can support. And another one is _consume_offset_ topic ,how big it can be and can we add more partitions to it.

I've personally worked with production Kafka clusters anywhere from 3 brokers to 20 brokers. They've all worked fine, it just depends on what kind of workload you're throwing at it. With Kafka, my general recommendation is that it's better to have a smaller amount of larger/more-powerful brokers, than having a bunch of tiny servers.
For a standing cluster, each broker you add increases "crosstalk" between the nodes, since they have to move partitions around, replicate data, as well as maintain the metadata in sync. This additional network chatter can impact how much load the broker can handle. As a general rule, adding brokers will add overall capacity, but you have to shift partitions around so that the load will be balanced properly across the entire cluster. Because of that, it's much better to start with 10 nodes, so that topics and partitions will be spread out evenly from the beginning, than starting out with 6 nodes and then adding 4 nodes later.
Regardless of the size of the cluster, there is always only one controller node at a time. If that node happens to go down, another node will take over as controller, but only one can be active at a given time, assuming the cluster is not in an unstable state.
The __consumer_offsets topic can have as many partitions as you want, but it comes by default set to 50 partitions. Since this is a compacted topic, assuming that there is no excessive committing happening (this has happened to me twice already in production environments), then the default settings should be enough for almost any scenario. You can look up the configuration settings for consumer offsets topics by looking for broker properties that start with offsets. in the official Kafka documentation.
You can get more details at the official Kafka docs page: https://kafka.apache.org/documentation/

The size of a cluster can be determined by the following ways.
The most accurate way to model your use case is to simulate the load you expect on your own hardware.You can use the kafka load generation tools kafka-producer-perf-test and kafka-consumer-perf-test.
Based on the producer and consumer metrics, we can decide the number of brokers for our cluster.
The other approach is without simulation, which is based on the estimated rate at which you get data that required data retention period.
We can also calculate the throughput and based on that we can also decide the number of brokers in our cluster.
Example
If you have 800 messages per second, of 500 bytes each then your throughput is 800*500/(1024*1024) = ~0.4MB/s. Now if your topic is partitioned and you have 3 brokers up and running with 3 replicas that would lead to 0.4/3*3=0.4MB/s.
More details about the architecture are available at confluent.
Within a Kafka Cluster, a single broker works as a controller. If you have a cluster of 100 brokers then one of them will act as the controller.
If we talk internally, each broker tries to create a node(ephemeral node) in the zookeeper(/controller). The first one becomes the controller. The other brokers get an exception("node already exists"), they set a watch on the controller. When the controller dies, the ephemeral node is removed and watching brokers are notified for the controller selection process.
The functionality of the controller can be found here.
The __consumer_offset topic is used to store the offsets committed by consumers. Its default value is 50 but it can be set for more partitions. To change, set the offsets.topic.num.partitions property.

Related

Uneven partition assignment in kafka streams

I am experiencing strange assignment behavior with Kafka Streams. I am having 3-node cluster of Kafka streams. My stream is pretty straightforward, one source topic (24 partitions, all kafka brokers are running on other machines than kafka stream nodes) and our stream graph only takes messages, group them by key, perform some filtering and store everything to sink topic. Everything is running with 2 Kafka Threads on each node.
However whenever I am doing rolling update of my kafka stream (by shutting down always only one app so other two nodes are running) my kafka streams ends with uneven number of partitions per "node"(usually 16-9-0). Only once I restart node01 and sometimes node02 cluster gets back to more even state.
Can somebody advice any hint how I can achieve more equal distribution before additional restarts?
I assume both nodes running the kafka streams app have identical group ids for consumption.
I suggest you check to see if the partition assignment strategy your consumers are using isn't org.apache.kafka.clients.consumer.RangeAssignor.
If this is the case, configure it to be org.apache.kafka.clients.consumer.RoundRobinAssignor. This way, when the group coordinator receives a JoinGroup request and hands the partitions over to the group leader, the group leader will ensure the spread between the nodes isn't uneven by more than 1.
Unless you're using an older version of Kafka streams, the default is Range and does not guarantee even spread across consumers.
Is your Kafka Streams application stateful? If so, you can possibly thank this well-intentioned KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams
If you want to override this behaviour, you can set acceptable.recovery.lag=9223372036854775807 (Long.MAX_VALUE).
The definition of that config from https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#acceptable-recovery-lag
The maximum acceptable lag (total number of offsets to catch up from the changelog) for an instance to be considered caught-up and able to receive an active task. Streams only assigns stateful active tasks to instances whose state stores are within the acceptable recovery lag, if any exist, and assigns warmup replicas to restore state in the background for instances that are not yet caught up. Should correspond to a recovery time of well under a minute for a given workload. Must be at least 0.

Understanding Kafka Horizontally Scaling: When do I need a new Broker?

When we want to horizontally scale a kafka cluster, we can do so by adding more kafka brokers. I I have been trying to understand, how does one decide "now is the time to add a broker to a kafka cluster"?
For sake of simplicity assuming that failover is not a requirement.
Are there:
any best practices in choosing the number of kafka brokers?
metrics that should be monitored? (number of partitions / consumers / producers / bytes trasnferred?)
any resources, we could use to monitor the health of the kafka brokers?
Apart from host machine storage, CPU, Memory and Networking, since Kafka brokers runs on JVM, all the important stats like heap memory, threads are important to
monitor and should be considered when scaling needs arises. From perspective of messaging functionality since topics are implemented as partitions
or commit ahead logs which are appended serially it's important to monitor the disk writes as well. Similarly since Kafka stores the messages persistently
for a predefined period - the storage size for each partition and overall size is also important.
You can use performance scripts provided by Kafka (here and here) and experiment with different setting like here to try out different configurations, monitor your cluster during the performance run and then decide right scaling for your use case.

Manually setting Kafka consumer offset

In our project, there are Active Kafka servers( PR) and Passive Kafka servers (DR), both Kafka brokers are configured with the same group name, topic name and partition in our project. When switching from PR to DR the _consumer_offsets is manually set on DR.
My question here is, would the Kafka consumer be able to seamlessly consume the messages from where it was last read?
When replicating messages across 2 clusters, it's not possible to ensure offsets stay in sync.
For example, if a topic exists for a little while on the Active cluster the log start offset for some partitions may not be 0 (some records have been deleted by the retention policies). Hence when replicating this topic, offsets between both clusters will not be the same. This can also happen when messages are lost or duplicated as you can't have exactly once semantics when replicating between 2 clusters.
So you can't just replicate the __consumer_offsets topic, this will not work. Consumer group positions have to be explicitly "translated" between both clusters. While it's possible to reset them "manually" by directly committing, it's not recommended as finding the new positions is not obvious.
Instead, you should use a replication tool that supports "offset translation" to ensure consumers can seamlessly switch from 1 cluster to the other.
For example, Mirror Maker 2, the official Kafka tool for mirroring clusters, supports offset translation via RemoteClusterUtils. You can find the details in the KIP.
In itself, relying on the fact that both clusters will have the same offset is faulty.
Offset - is relative characteristic. It's not a part of a message. It's literally a position inside the file. And those files, Kafka log files, also rotate and have retentions. There's no guarantee that those log files are identical at any given point in time. Kafka doesn't claim to solve such an issue.
Besides, it's tricky to solve from CAP point of view.
And it's also pointless unless you want strict physical replication.
That's why Kafka multi-cluster tools are usually about logical replication. I have not used Mirror Maker(MM) but I've used Replicator(which is a more advanced commercial tool by Confluent) and it has a feature for that called, who would have guessed, just like the MM one - offset translation.
Replicator does the following:
Reads the consumer offset and timestamp information from the
__consumer_timestamps topic in the origin cluster to understand a consumer group’s progress.
Translates the committed offsets in the
origin datacenter to the corresponding offsets in the destination
datacenter.
Writes the translated offsets to the __consumer_offsets
topic in the destination cluster, as long as no consumers in that
group are connected to the destination cluster.
Note: You do need to add an interceptor to your Kafka Consumers.

How to change the number of brokers for a topic in a kafka cluster?

I have a problem with some Kafka topics and couldn't find an answer to it yet.
While adding more partitions to __confluent.support.metrics shouldn't be a problem (I know how to do that), I wonder if it is possible to tell it to use brokers which obviously can not be seen by this topic?
Also I'd love to understand why these topics only inherit some brokers instead of all available 5 brokers in their cluster.
I'd love to fix these topics. But I fear that if I tell it to add (or use) partitions on brokers the topic can't "see", that it might not work or even destroy the topic, which would be rather bad.
How can I instruct these topics, that there are 5 available brokers? Can I do it with one of the Kafka tools?
How could that have happened in the first place?
Why does the __consumer_offsets topic only "see" 4 brokers instead of 5 like all other topics in this cluster do?
FYI: I didn't setup any of this, but I have to cleanup/revamp the running clusters and am stuck now, I never came across this sort of problem before
The reason this has happened is because you have only one partition and one replica for the __confluent.support.metrics topic. In a 5-node cluster, this means you will only be using 20% of the available brokers in the cluster, which corresponds with the image you've posted. A topic with replication-factor 1 and 1 partition will only ever hold data on one broker.
On the other hand, it is unusual that your __consumer_offsets topic would be using only 4 out of 5 brokers. My guess would be that your 5th broker was not online at the time of creation of __consumer_offsets (this is created when you consume from any topic for the first time) and thus no partitions were created on this broker.
However, this is probably nothing to worry about, as the spread of partitions across the cluster is generally handled by Kafka itself rather than being a user problem. There is no concept of a topic "seeing" a broker per se; rather, the brokers hold the data for the topics, and the topics will know which brokers they reside on. A topic doesn't generally need to concern itself with other brokers.
Both the consumer offsets and Confluent metrics topics have line items in the server properties file that determines what configurations those topics will be created with.
To improve the health of those topics, you can attempt to increase the replication factor, which will spread your topic over more brokers and provide fault tolerance. Also see Kafka Tools Wiki

Simulate kafka broker failures in multi node kafka cluster and what operations and tools to use to mitigate data loss issues

Is there any tools or operation to use to mitigate data loss issues when kafka broker fail in multi node kafka cluster.
well, replication is an important features of Kafka and a key element to avoid data loss. In particular, should one of your broker go down, the replica on other brokers will be used by the consumers just as nothing happened (from the business side). Of course, this has consequences on the connections, band width etc.
However, a message must have been properly produced to be replicated.
So basically, if you have a replication set at higher than 1, this should be safe, as long as your producers don't go down.
The default.replication.factor is 1, so set replication (at the topic or general level) to 2 or 3. Of course you need 2 or 3 brokers.
http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor