Kafka - is it possible to alter cluster size while keeping the change transparent to Producers and Consumers? - apache-kafka

I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with altering the size of an existing cluster?
I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with adding brokers to an existing cluster without tearing down the cluster? Is there anything to be taken care of when doing this?

Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new
servers will not automatically be assigned any data partitions, so
unless partitions are moved to them they won't be doing any work until
new topics are created. So usually when you add machines to your
cluster you will want to migrate some existing data to these machines.
Refer here
Kafka supports:
Expanding your cluster
Automatically migrating data to new machines
Custom partition assignment and migration

Yes, you can increase the number of partitions of a topic using command line or AdminClient without restarting the cluster processes.
Example:
bin/kafka-topics.sh --zookeeper zk_host:port --alter --topic testtopic1
--partitions 20
Please be aware, that altering the partitions doesn't change the partitioning of existing topics.
Kafka only allows to increase the partitions but you can't decrease the partitions. If in case, you have to reduce the partitions, you need to delete and recreate the topic.
For your question "what happens with producer/consumer behave for newly added partitions"
Kafka has a property metadata.max.age.ms for producers and consumers which defaults to 300000.
metadata.max.age.ms : The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.
After the given value, metadata are updated and any newly added partitions will be detected by the producers/consumers.

Related

Scale-up Kafka without downtime

Lets say that we have a service A,B and a Kafka Broker in different VMs.
Service_A is the producer, Service_B is the consumer.
If Kafka can't handle workload then:
a) How can it scale-up without downtime?
Also,
b) Can number of partitions change in real-time without downtime?
The answer to both is yes, but scaling out Kafka (adding brokers) will not add partitions to existing topics; you'll need to manually invoke kafka-reassign-partitions command, which requires brokers to be running.
The main problem you may run into with that is if there are clients actively using some partition on a broker and you completely move it off to a different broker.
You cannot reduce partitions, ever.

How to do data rebalance on kafka if data is stored persistently

I'm new to kafka and preparing use it for production.
What strategies can be used for rebalancing data storage if brokers for a topic's current partitions are running out of disk space, if more brokers can be added to the cluster?
By a simple example, say a topic has 3 partitions at beginning (1 replica to simplify problem), and 3 brokers each stores 1 partition of the topic, and each of these partition takes up 1TB disk space.
How can I add 3 more new broker servers and alter topic's partition amount to 6, and end up with a data rebalance result of each of the 6 partitions takes up 500GB disk space on its broker?
I think this problem is critical for storing large amount of data forever in kafka cluster.
Thanks.
kafka-reassign-partitions & kafka-preferred-replica-election are the built in commands to handle such relocation tasks, as Kafka does not perform it automatically on cluster expansion.
There are vendored alternatives, such as from Confluent and DataDog.
How can I add 3 more new broker servers
See Docs - Expanding your cluster
alter topic's partition amount to 6
Use kafka-topics --alter and increase partitions (note: this does not relocate existing data to new partitions, or in other words "re-key" the topic)
Also, keep in mind that once you create topics, replicas and ISRs will get defined. Where possible, try to choose a replication factor of 3 for resiliency and durability. Having a replication factor of 2 in a 3-node cluster is not helpful in certain sticky situations, where if one (of the 3) brokers goes down, then none of the available or online brokers will join the replica set (to satisfy the replication factor) and move into the ISR.
In a situation like this, you will end up with an ISR that is incomplete and worse, end up with a single point of failure.
Note that broker being down if different from expanding or contracting the Kafka cluster.

Why is my kafka topic not consumable with a broker down?

My issue is that I have a three broker Kafka Cluster and an availability requirement to have access to consume and produce to a topic when one or two of my three brokers is down.
I also have a reliability requirement to have a replication factor of 3. These seem to be conflicting requirements to me. Here is how my problem manifests:
I create a new topic with replication factor 3
I send several messages to that topic
I kill one of my brokers to simulate a broker issue
I attempt to consume the topic I created
My consumer hangs
I review my logs and see the error:
Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic
If I set all my broker's offsets.topic.replication.factor setting to 1, then I'm able to produce and consume my topics, even if I set the topic level replication factor to 3.
Is this an okay configuration? Or can you see any pitfalls in setting things up this way?
You only need as many brokers as your replication factor when creating the topic.
I'm guessing in your case, you start with a fresh cluster and no consumers have connected yet. In this case, the __consumer_offsets internal topic does not exist as it is only created when it's first needed. So first connect a consumer for a moment and then kill one of the brokers.
Apart from that, in order to consume you only need 1 broker up, the leader for the partition.

Increase number of partitions in a Kafka topic from a Kafka client

I'm a new user of Apache Kafka and I'm still getting to know the internals.
In my use case, I need to increase the number of partitions of a topic dynamically from the Kafka Producer client.
I found other similar questions regarding increasing the partition size, but they utilize the zookeeper configuration. But my kafkaProducer has only the Kafka broker config, but not the zookeeper config.
Is there any way I can increase the number of partitions of a topic from the Producer side? I'm running Kafka version 0.10.0.0.
As of Kafka 0.10.0.1 (latest release): As Manav said it is not possible to increase the number of partitions from the Producer client.
Looking ahead (next releases): In an upcoming version of Kafka, clients will be able to perform some topic management actions, as outlined in KIP-4. A lot of the KIP-4 functionality is already completed and available in Kafka's trunk; the code in trunk as of today allows client to create and to delete topics. But unfortunately, for your use case, increasing the number of partitions is still not possible yet -- this is in scope for KIP-4 (see Alter Topics Request) but is not completed yet.
TL;DR: The next versions of Kafka will allow you to increase the number of partitions of a Kafka topic, but this functionality is not yet available.
It is not possible to increase the number of partitions from the Producer client.
Any specific use case use why you cannot use the broker to achieve this ?
But my kafkaProducer has only the Kafka broker config, but not the
zookeeper config.
I don't think any client will let you change the broker config. You can only access (read) the server side config at max.
Your producer can provide different keys for ProducerRecord's. The broker would place them in different partitions. For example, if you need two partitions, use keys "abc" and "xyz".
This can be done in version 0.9 as well.

How to scale single node Kafka to multiple node cluster?

I am going to install Kafka for company messaging. The plan is to first install the kafka on a single huge machine and scale it to 4-5 machines (a cluster) later if needed.
I have little experience about kafka. Want to ask whether it is possible to scale by just changing the parameter in broker configuration and install zookeeper on newly joined machine.
Or how can I roughly do this in the easiest way ? More specifically Cloudera Kafka in CDH.
Thanks
To scale Kafka you will have to add more partitions to topics if needed to using kafka-topics.sh. And than reassign partitions to your new brokers using kafka-reassign-partitions.sh.
The reassign utility will replicate and dispatch your data automatically. You can do it for a whole topic or for a selective set of partitions.
The complete documentation is here. Just take a look at section 6.