Scale-up Kafka without downtime - apache-kafka

Lets say that we have a service A,B and a Kafka Broker in different VMs.
Service_A is the producer, Service_B is the consumer.
If Kafka can't handle workload then:
a) How can it scale-up without downtime?
Also,
b) Can number of partitions change in real-time without downtime?

The answer to both is yes, but scaling out Kafka (adding brokers) will not add partitions to existing topics; you'll need to manually invoke kafka-reassign-partitions command, which requires brokers to be running.
The main problem you may run into with that is if there are clients actively using some partition on a broker and you completely move it off to a different broker.
You cannot reduce partitions, ever.

Related

Is it possible / best practice to add different kafka brokers to the same cluster?

In case I have a cluster and I have in it a broker consume/produce event x from microservice MS-1
Can I add additional broker to the same cluster so it will consume/produce event y from microservice MS-2 or for each broker type have to generate dedicated cluster ?
Is it best practice or even possible ?
I am asking since I have seen that brokers used as leader-follower on the same cluster, means all are replicas of the leader.
Your brokers are the nodes in the cluster that handle requests from your clients. Your clients are Consumers or Producers (or both) that interact with your cluster (Consumers and Producers are not Brokers).
While you can add brokers to a running cluster, the concept I think you're looking for is a Topic, which is a group of related event/message types. Your cluster can support many Topics, and yes, microservice1 could produce events to Topic1, and microservice2 could produce events to Topic2.

Kafka - is it possible to alter cluster size while keeping the change transparent to Producers and Consumers?

I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with altering the size of an existing cluster?
I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with adding brokers to an existing cluster without tearing down the cluster? Is there anything to be taken care of when doing this?
Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new
servers will not automatically be assigned any data partitions, so
unless partitions are moved to them they won't be doing any work until
new topics are created. So usually when you add machines to your
cluster you will want to migrate some existing data to these machines.
Refer here
Kafka supports:
Expanding your cluster
Automatically migrating data to new machines
Custom partition assignment and migration
Yes, you can increase the number of partitions of a topic using command line or AdminClient without restarting the cluster processes.
Example:
bin/kafka-topics.sh --zookeeper zk_host:port --alter --topic testtopic1
--partitions 20
Please be aware, that altering the partitions doesn't change the partitioning of existing topics.
Kafka only allows to increase the partitions but you can't decrease the partitions. If in case, you have to reduce the partitions, you need to delete and recreate the topic.
For your question "what happens with producer/consumer behave for newly added partitions"
Kafka has a property metadata.max.age.ms for producers and consumers which defaults to 300000.
metadata.max.age.ms : The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.
After the given value, metadata are updated and any newly added partitions will be detected by the producers/consumers.

Kafka Producer, Consumer, Broker in same host?

Are there any downsides to running the same producer and consumer code for all nodes in the cluster? If there are 8 nodes in the cluster (8 consumer, 8 kafka broker, and 8 producers), would 8 producers be running at the same time in the cluster then? Is there a way to modify cluster so that only one producer runs at a time?
Kafka cluster is nothing but Kafka brokers running under a distributed consensus. Kafka cluster is agnostic about number of producers and consumers running around it. Producers and consumers are clients of the Kafka cluster. Producers will stream data to Kafka and consumers consume the data from Kafka. Within Kafka cluster data will be distributed within topics. Topics are sharded using partitions. If multiple consumers belong to the same consumer group consumers can work in a self healing fashion.
Is there a way to modify cluster so that only one producer runs at a
time?
If you intend to run a single producer at certain point of time, you don't need to make any change within cluster.
Are there any downsides to running the same producer and consumer code for all nodes in the cluster?
The primary downsides here would be scalability and memory usage.
Producers and Consumers are not required to run on Brokers. Producers should be deployed where data is being generated (or running as separate hosts, like Kafka Connect workers).
Consumers should be scaled out independently based on the throughput and ordering guarantees that you need in your downstream systems.
There is nothing that says 8 brokers requires 8 producers and 8 consumers; partitions are what matters more
If you have N partitions in a topic, you can only scale to N active consumers anyway, and infinitely many producers
8 brokers can hold lots of partitions for any given topic
Running a single producer is an implementation of your own code. The broker cannot force it.

Kafka Producer, multi DC failover support

I have two distinct kafka clusters located in different data centers - DC1 and DC2. How to organize kafka producer failover between two DCs? If primary kafka cluster (DC1) becomes unavailable, I want producer to switch to failover kafka cluster (DC2) and continue publishing to it? Producer also should be able to switch back to primary cluster, once it is available. Any good patterns, existing libs, approaches, code examples?
Each partition of the Kafka topic your producer is publishing to has a separate leader, often spread across multiple brokers in the cluster, so the producer is connected to many “primary” brokers simultaneously. Should any one of them fail another In Sync Replica (ISR) will be elected as leader and automatically take over. You do not need to do anything in your client app for it to reconnect to the new leader(s), retry any failed requests, and continue.
If this is for Multi-Data Center (MDC) failover then things get much more complicated depending on if the client apps die as well or if they keep running and need just their cluster connections to failover. Offsets are not preserved across multiple Kafka clusters so while producers are simpler, consumers need to call GetOffsetsForTimes() upon failover.
For a great write up of the the MDC failover modes and best practices see the MDC Whitepaper here: https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/
Since you asked only about producers, your app can detect if the primary cluster is down (say for a certain number of retries) and then instead of attempting to reconnect, it can instead connect to another brokerlist from the secondary cluster. Alternatively you can redirect the dns name of the brokerlist hosts to point to the secondary cluster.

How to mitigate the impact on connected producers and consumers when restarting a Kafka broker?

We've got a Kafka 0.9.0.1 cluster, with 4 brokers, a few hundred topics, 20-50 consumers and
10-15 producers.
Often we've got to do changes on the cluster config and we do rolling restarts, with controlled
shutdowns. However, our consumers and producers see a big impact when:
The broker is shutting down
The broker joins the cluster again
The cluster does a rebalance
The type of exceptions the clients are producing are in the range of:
NotLeaderForPartitionException
NotEnoughReplicasException
NotEnoughReplicasAfterAppendException
TimeoutException
NetworkException
My question is, is there a way to mitigate the impact, or is it considered the normal way
kafka and the clients operate?
Is any change in the topology of the cluster (partition leader and ISRs) supposed to cause
these exceptions?
Thank you,