Who are clients of ZooKeeper? - apache-zookeeper

Just started reading the Documentation of Zookeeper. Read that zk has servers ( followers + leader) and clients. Who actually are the clients of zk ? The nodes of distributed system that it co-ordinates ?
Also read that
ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
Does this means that znodes are thousands in numbers ? And what kind of read and write do we want on zk ?

Who actually are the clients of zk ?
A client is any process that connects to the ZooKeeper ensemble using the ZooKeeper client API. Apache ZooKeeper ships with API bindings for Java and C. More information about the Java API is available in the JavaDocs and examples and recipes.
ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
Does this means that znodes are thousands in numbers ?
The "thousands" here refers to the number of machines running ZooKeeper, not the number of znodes stored in the ZooKeeper ensemble. A znode refers to a node stored within the ZooKeeper cluster's hierarchy of data, similar to the concept of an inode in a tradtional file system.
And what kind of read and write do we want on zk ?
Reads refer to operations that get data from znodes or set watches to be informed when changes are applied to znodes. Writes refer to operations that create new znodes, delete existing znodes, or change data attached to znodes.
Reading through the API docs, examples and recipes should shed more light on all of this.

Related

Do you need multiple zookeeper instances to run a multiple-broker kafka?

I'm new to kafka.
Kafka is supposed to be used as a distributed service. But the tutorials and blog posts i found online never mention if there is one or several zookeeper nodes.
The tutorials just pop one zookeper instance, and then multiple kafka brokers.
Is it how it is supposed to be done?
Zookeeper is a co-ordination service (in a centralized manner) for distributed systems that is used by clusters for maintenance of distributed system . The distributed synchronization achieved by it via metadata such as configuration information, naming, etc.
In general architectures, Kafka cluster shall be served by 3 ZooKeeper nodes, but if the size of deployment is huge, then it can be ramped up to 5 ZooKeeper nodes but that in turn will add load on the nodes as all nodes try to be in sync as all metadata related activities are handled by ZooKeeper.
Also, it should be noted that as an improvement, the new release of Kafka reduces dependency on ZooKeeper in order to enhance scalability of metadata across, to reduce the complexity in maintaining the meta data with external components and to enhance the recovery from unexpected shutdowns. With new approach, the controller failover is almost instantaneous. This is achieved by Kafka Raft Metadata mode termed as 'KRaft' that will run Kafka without ZooKeeper by merging all the responsibilities handled by ZooKeeper inside a service in the Kafka Cluster itself and operates on event based mechanism that is used in the KRaft protocol.
Tutorials generally keep things nice and simple, so one ZooKeeper (often one Kafka broker too). Useful for getting started; useless for any kind of resilience :)
In practice, you are going to need three ZooKeeper nodes minimum.
If it helps, here is an enterprise reference architecture whitepaper for the deployment of Apache Kafka
Disclaimer: I work for Confluent, who publish the above whitepaper.

Building a Kafka Cluster using two servers only

I'm planning to build a Kafka Cluster using two servers, and host Zookeeper on these two servers as well.
The Question is, since Kafka requires Zookeeper to run, what is the best cluster build for zookeeper to implement Kafka Cluster on two servers?
for eg. I'm currently running two zookeepers on both servers and one Kafka on each server, and in the Kafka configuration they point to all Zookeepers.
Is there a better way to do this?
First of all, you don't have to setup Zookeper and Kafka in the same server. One of the roles of Zookeeper is electing controller. (one of the brokers which is responsible for maintaining the leader/follower relationship for all the partitions) For election; majority of Zookeper nodes must be alive. In your case even one Zookeeper instance is down, you cannot select controller. So there is no difference between having one Zookeper or two. That's why it is recommended to have at least 3 nodes in Zookeeper cluster. By this way you can handle failure of one Zookeeper node.
An addition to this, it is highly recommended to have at least three brokers in your Kafka cluster to maintain both consistency and high availability. (link1, link2)
UPDATE:
As long as you are limited to only two servers, then you can consider sacrificing from high availability by set up your broker by setting min.insync.replicas=2 and having topics with replication.factor=2. If HA is more important than data loss, then you can use min.insync.replicas=1 (default) broker config with again topic replication.factor=2. In this circumstance, your options are these IMHO. (Having one or two Zookeepers is not important as I mentioned above)
I am often faced with the same problem as you do #frisky5 where i would like to achieve a "suboptimal" HA system using only 2 nodes, and thus workarounds are always needed with cloud-native frameworks that rely on the assumption that clusters will have lot of nodes available.
That ain't always the case in real life, is it ;) ?
That being said, i see you essentially having 2 options:
Externalize zookeeper configuration on a replicated storage system using 2 nodes (e.g. DRBD)
Replicate Kafka data volumes entirely on the second nodes and use 2 one-node Kafka clusters that you switch on and off depending on who is the current master node.
I would go for the first option. In that case you would have 2 Kafka servers and one zookeeper server whose ip needs to be static (virtual ip). When the zookeeper node goes down, it is restarted one the second node with same VIP, but it needs to access the synchronized data folder.
I am not too familiar with zookeepers internals and i can't tell you whether it will go in conflict when starting up on a data store who "wasn't its own" but i would guess it makes sense for you to test it using a simple rsync setup.
Another way to achieve consensus if you are using a k3s based kubernetes cluster would be to rely on internal k8s distributed consensus mechanics to "tell Kafka" which node is the leader. This works for the postgresoperator by chruncydata because Patroni is cool ( https://patroni.readthedocs.io/en/latest/kubernetes.html ) 😎 but i am not sure if Kafka/zookeeper are that flexible and can communicate with a rest API to set their locks ...
Once you have achieved this intermediate step, then you can use a PostgreSQL db as external source of truth for k3s and then it is as simple as syncing the postgres data folder between the machines (easily done with rsync). The beauty of this approach is that it is way more generic and could be used for other systems too.
Let me know what do you think about these two approaches and whether you manage to setup a test environment. If you do on GitHub i can help you out with implementation

Running a single kafka s3 sink connector in standalone vs distributed mode

I have a kafka topic "mytopic" with 10 partitions and want to use S3 sink connector to sink records to an S3 bucket. For scaling purposes it should be running on multiple nodes to write partitions data in parallel to the same S3 bucket.
In Kafka connect user guide and actually many other blogs/tutorials it's recommended to run workers in distributed mode instead of standalone to achieve better scalability and fault tolerance:
... distributed mode is more flexible in terms of scalability and offers the added advantage of a highly available service to minimize downtime.
I want to figure out which mode to choose for my use case: having one logical connector running on multiple nodes in parallel. My understanding is following:
If I run in distributed mode, I will end up having only 1 worker processing all the partitions, since it's considered one connector task.
Instead I should run in standalone mode in multiple nodes. In that case I will have a consumer group and achieve parallel processing of partitions.
In above described standalone scenario I will actually have fault tolerance: if one instance dies, the consumer group will rebalance and other standalone workers will handle the freed partitions.
Is my understaning correct or am I missing something?
Unfortunately I couldn't find much information on this topic other than this google groups discussion, where the author came to the same conclusion as I did.
In theory, that might work, but you'll end up ssh-ing to multiple machines, having basically the same config files, and just not using the connect-distributed command instead of connect-standalone.
You're missing the part about Connect server task rebalancing, though, which communicates over the Connect server REST ports
The underlying task code is all the same, only the entrypoint and offset storage are different. So, why not just use distributed if you have multiple machines?
You don't need to run, multiple instances of standalone processes, the Kafka workers are taking care of distributing the tasks, rebalancing, offset management under the distributed mode, you need to specify the same group id ...

How to recover Kafka from complete zookeeper loss and new start?

I have a simple Kafka cluster of 3 brokers and 3 zk nodes.
If I wipe out 2/3 zk nodes and bring them back (even new "clean" ones), everything recovers as zk re-syncs.
If I wipe out all 3 zk nodes and restart them "clean" (think docker containers or AWS auto-scaling group instances), the brokers are confused. All of the data structures in zk (basic paths, brokers, topics, etc.) are gone, since I have a blank zk.
How can I recover from this scenario? I am (potentially) willing to live with lost topics (since we automate topic creation), but the brokers (unlike with startup) do not "know" that zk is blank and so do not reinitialize (set up structures, register brokers, etc.). Conversely, I could back up zk and restore it, as long as I know what to backup/restore.
The key element is fully automated, though. In cloud-native, I cannot rely on a human doing the restore or checking.
I'm not sure that managing Zookeeper nodes (or Kafka brokers for that matter) with autoscaling is such a good idea.
For one Zookeeper maintains the topic information (and if you are not using the latest Kafka builds or are sill using the old consumer API it also maintains the consumer offsets).
In addition to that topic partitions are statically assigned to brokers, so if you bring down the current Kafka brokers and spawn new nodes you have to be very careful and start brokers with the same broker.id and data otherwise Kafka might get confused.
Third regarding Zookeeper you have to be careful not to create a cluster of a pair number of nodes otherwise the consensus algorithm will not be able to elect a leader due to missing majority in the voting phase.
Having said all that I think that doing a backup and restore of one of the Zookeeper nodes should work. It would be even easier if you set up things so that at least one of the nodes cannot be turned off (or alternative you use a persistent storage for that one).
This way you ensure that one of the Zookeeper nodes will always have the latest data and it will take care of replicating it to the other nodes.

How to scale single node Kafka to multiple node cluster?

I am going to install Kafka for company messaging. The plan is to first install the kafka on a single huge machine and scale it to 4-5 machines (a cluster) later if needed.
I have little experience about kafka. Want to ask whether it is possible to scale by just changing the parameter in broker configuration and install zookeeper on newly joined machine.
Or how can I roughly do this in the easiest way ? More specifically Cloudera Kafka in CDH.
Thanks
To scale Kafka you will have to add more partitions to topics if needed to using kafka-topics.sh. And than reassign partitions to your new brokers using kafka-reassign-partitions.sh.
The reassign utility will replicate and dispatch your data automatically. You can do it for a whole topic or for a selective set of partitions.
The complete documentation is here. Just take a look at section 6.