Kafka scalability if consuming from replica node

Kafka scalability if consuming from replica node - apache-kafka

In a cluster scenario with data replication > 1, why is that we must always consume from a master/leader of a partition instead of being able to consume from a replica/follower node that contains a replica of this master node?
I understand the Kafka will always route the request to a master node(of that particular partition/topic) but doesn't this affect scalability (since all requests go to a single node)? Wouldnt it be better if we could read from any node containing the replica information and not necessarily the master?

Partition leader replicas, from which you can write/read data, are evenly distributed among available brokers. Anyway, you may also want to leverage the "fetch from closest replica" functionality, which is described in KIP-392, and available since Kafka 2.4.0.

Related

Consume directly from ActiveMQ Artemis replica

In a cluster scenario using HA/Data replication feature is there a way for consumers to consume/fetch data from a slave node instead of always reaching out to the master node (master of that particular queue)?
If you think about scalability, having all consumers call a single node responsible to be the master of a specific queue means all traffic goes to a single node.
Kafka allows consumers to fetch data from the closest node if that node contains a replica of the leader, is there something similar on ActiveMQ?

In short, no. Consumers can only consume from an active broker and slave brokers are not active, they are passive.
If you want to increase scalability you can add additional brokers (or HA broker pairs) to the cluster. That said, I would recommend careful benchmarking to confirm that you actually need additional capacity before increasing your cluster size. A single ActiveMQ Artemis broker can handle millions of messages per second depending on the use-case.
As I understand it, Kafka's semantics are quite different from a "traditional" message broker like ActiveMQ Artemis so the comparison isn't particularly apt.

Kafka Producer, multi DC failover support

I have two distinct kafka clusters located in different data centers - DC1 and DC2. How to organize kafka producer failover between two DCs? If primary kafka cluster (DC1) becomes unavailable, I want producer to switch to failover kafka cluster (DC2) and continue publishing to it? Producer also should be able to switch back to primary cluster, once it is available. Any good patterns, existing libs, approaches, code examples?

Each partition of the Kafka topic your producer is publishing to has a separate leader, often spread across multiple brokers in the cluster, so the producer is connected to many “primary” brokers simultaneously. Should any one of them fail another In Sync Replica (ISR) will be elected as leader and automatically take over. You do not need to do anything in your client app for it to reconnect to the new leader(s), retry any failed requests, and continue.
If this is for Multi-Data Center (MDC) failover then things get much more complicated depending on if the client apps die as well or if they keep running and need just their cluster connections to failover. Offsets are not preserved across multiple Kafka clusters so while producers are simpler, consumers need to call GetOffsetsForTimes() upon failover.
For a great write up of the the MDC failover modes and best practices see the MDC Whitepaper here: https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/
Since you asked only about producers, your app can detect if the primary cluster is down (say for a certain number of retries) and then instead of attempting to reconnect, it can instead connect to another brokerlist from the secondary cluster. Alternatively you can redirect the dns name of the brokerlist hosts to point to the secondary cluster.

Kafka leader election in multi-dc with an arbiter/witness/observer

I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode.
For example, let say that both datacenters have 3 nodes with 2 in-sync replica (ISR) on the first DC and one ISR on the second DC.
Is it possible to have a third DC containing an arbiter/witness/observer node such that in case of failure of one DC, a leader election can succeed with the correct outcome in term of consistency? mongoDB has such feature named Replica set Arbiter.
What about deploying ZooKeeper on the three datacenters? From my understanding ZooKeeper does not hold the Kafka data and it should not be contacted for each new record in the Kafka topic, i.e. you do not pay the latency to the third DC for each new record.

There is one presentation at the Kafka summit 2017 One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers speaking about this setup. There is also some interesting information inside a Confluent whitepaper Disaster Recovery for Multi-Datacenter Apache Kafka® Deployments.
It says it could work and they called it an observer node but it also says no one has ever tried this.
Zookeeper keeps tracks of the following metadata for Kafka (0.9.0+).
Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
Cluster membership - which brokers are alive and part of the cluster? this is also managed through ZooKeeper.
Topic configuration - what overrides are there for that topic, where are the partitions located etc.
Quotas - how much data is each client allowed to read and write
ACLs - who is allowed to read and write to which topic
More detail on the dependency between Kafka and Zookeeper on the Kafka FAQ and answer at Quora from a Kafka commiter working at Confluent.
From the resources I have read, a setup with two DC (Kafka plus Zookeeper ) and an arbiter/witness/observer Zookeeper node on a third DC with high latency could work but I haven't found any resources that has actually experimented it.

How to recover Kafka from complete zookeeper loss and new start?

I have a simple Kafka cluster of 3 brokers and 3 zk nodes.
If I wipe out 2/3 zk nodes and bring them back (even new "clean" ones), everything recovers as zk re-syncs.
If I wipe out all 3 zk nodes and restart them "clean" (think docker containers or AWS auto-scaling group instances), the brokers are confused. All of the data structures in zk (basic paths, brokers, topics, etc.) are gone, since I have a blank zk.
How can I recover from this scenario? I am (potentially) willing to live with lost topics (since we automate topic creation), but the brokers (unlike with startup) do not "know" that zk is blank and so do not reinitialize (set up structures, register brokers, etc.). Conversely, I could back up zk and restore it, as long as I know what to backup/restore.
The key element is fully automated, though. In cloud-native, I cannot rely on a human doing the restore or checking.

I'm not sure that managing Zookeeper nodes (or Kafka brokers for that matter) with autoscaling is such a good idea.
For one Zookeeper maintains the topic information (and if you are not using the latest Kafka builds or are sill using the old consumer API it also maintains the consumer offsets).
In addition to that topic partitions are statically assigned to brokers, so if you bring down the current Kafka brokers and spawn new nodes you have to be very careful and start brokers with the same broker.id and data otherwise Kafka might get confused.
Third regarding Zookeeper you have to be careful not to create a cluster of a pair number of nodes otherwise the consensus algorithm will not be able to elect a leader due to missing majority in the voting phase.
Having said all that I think that doing a backup and restore of one of the Zookeeper nodes should work. It would be even easier if you set up things so that at least one of the nodes cannot be turned off (or alternative you use a persistent storage for that one).
This way you ensure that one of the Zookeeper nodes will always have the latest data and it will take care of replicating it to the other nodes.

Running zookeeper on a cluster of 2 nodes

I am currently working on trying to use zookeeper in a two node cluster. I have my own cluster formation algorithm running on the nodes based on configuration. We only need Zookeeper's distributed DB functionality.
Is it possible to use Zookeeper in a two node cluster ? Do you know of any solutions where this has been done ?
Can we still retain the zookeepers DB functionality without forming a quorum ?
Note: Fault tolerance is not the main concern in this project. If one of the nodes go down we have enough code logic to run without the zookeeper service. We use the zookeeper to share data when both the nodes are alive.
Would greatly appreciate any help.

Zookeeper is a coordination system which is basically used to coordinate among nodes. When writes are occurred to such a distributed system, in ordered to coordinate and agree upon values which are being stored, all the writes are gone through master (aka leader). Reads can occur through any node. Zookeeper requires a master/leader to be elected per a quorum in order to serve write requests consistently. Zookeeper make use of the ZAB protocol as the consensus algorithm.
In order to elect a leader, a quorum should ideally have an odd number of nodes (Otherwise, a node will not be able to win majority and become the leader). In your case, with two nodes, zookeeper will not possibly be able to elect a leader for a long time since both nodes will be candidates and wait for the other node to vote for it. Even though they elect a leader, your ensemble will not work properly in network patitioning situations.
As I said, zookeeper is not a distributed storage. If you need to use it in a distributed manner (more than one node), it need to form a quorum.
As I see, what you need is a distributed database. Not a distributed coordination system.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse