ZooKeeper ZNode connection limit - apache-zookeeper

I am basically making use of the togglz zookeeper integration for managing feature flags. The feature flags are stored in zookeeper as child znodes under /mycompany/features.
This integration basically makes use of Tree Cache internally to stay eventually consistent with ZooKeeper changes.
This class on github can add more clarity on the implementation specifics.
We run on a 5 node ZooKeeper ensemble.
We have around 100 microservices, and each can have 5 instances.
Since every microservice instance leverages the tree cache, we are essentially looking at 500 instances all targetting the ZNode /mycompany/features and its child ZNodes.
I am trying to find out the following:
Would this setup cause performance bottlenecks.
If yes, then what can/should be done to circumvent performance bottlenecks.

Related

Do you need multiple zookeeper instances to run a multiple-broker kafka?

I'm new to kafka.
Kafka is supposed to be used as a distributed service. But the tutorials and blog posts i found online never mention if there is one or several zookeeper nodes.
The tutorials just pop one zookeper instance, and then multiple kafka brokers.
Is it how it is supposed to be done?
Zookeeper is a co-ordination service (in a centralized manner) for distributed systems that is used by clusters for maintenance of distributed system . The distributed synchronization achieved by it via metadata such as configuration information, naming, etc.
In general architectures, Kafka cluster shall be served by 3 ZooKeeper nodes, but if the size of deployment is huge, then it can be ramped up to 5 ZooKeeper nodes but that in turn will add load on the nodes as all nodes try to be in sync as all metadata related activities are handled by ZooKeeper.
Also, it should be noted that as an improvement, the new release of Kafka reduces dependency on ZooKeeper in order to enhance scalability of metadata across, to reduce the complexity in maintaining the meta data with external components and to enhance the recovery from unexpected shutdowns. With new approach, the controller failover is almost instantaneous. This is achieved by Kafka Raft Metadata mode termed as 'KRaft' that will run Kafka without ZooKeeper by merging all the responsibilities handled by ZooKeeper inside a service in the Kafka Cluster itself and operates on event based mechanism that is used in the KRaft protocol.
Tutorials generally keep things nice and simple, so one ZooKeeper (often one Kafka broker too). Useful for getting started; useless for any kind of resilience :)
In practice, you are going to need three ZooKeeper nodes minimum.
If it helps, here is an enterprise reference architecture whitepaper for the deployment of Apache Kafka
Disclaimer: I work for Confluent, who publish the above whitepaper.

How to handle failure senario for kafka and zookeeper in kubernetes

What I have zookeeper setup which is running on server1, server2 and server3 and similarly kafka also running in server1, server2 and server3.
Setup are running in kubernetes.
Problem statement:
In case one zookeeper setup get down entire setup will get down, because kafka is depended to zookeeper. am i right?
If Q1 correct - Is there any way to make setup like if one zookeeper server will get down then kafka should run as it is?
How to expose kafka port in kubernetes setup ?
what is the recommended way to persist data in kubernetes for production server ?
I fail to see how Zookeeper questions are related to k8s... But you definitely should set affinity rules such that Zookeeper and Kafka are not on the same physical servers or sharing same disks
If one Zookeeper out of three goes down, you'll end up with a split brain event in that no single Zookeeper knows which should be responsible for leadership. This effectively can crash or corrupt Kafka, yes.
To mitigate that risk, you can choose to run 5 Zookeepers, in which case you can lose up to 3 servers to reach the same state. The Definitive Guide book covers these concepts in the first few chapters
Regarding the other questions - NodePorts and PVCs, generally speaking.
Use one of the popular Kafka Operators on Github and you'll not need to think too hard about setting those properties
You still must manually perform Kafka admin tasks in any installation... You can use extra services like Cruise Control if you want to reduce that workload, though

Building a Kafka Cluster using two servers only

I'm planning to build a Kafka Cluster using two servers, and host Zookeeper on these two servers as well.
The Question is, since Kafka requires Zookeeper to run, what is the best cluster build for zookeeper to implement Kafka Cluster on two servers?
for eg. I'm currently running two zookeepers on both servers and one Kafka on each server, and in the Kafka configuration they point to all Zookeepers.
Is there a better way to do this?
First of all, you don't have to setup Zookeper and Kafka in the same server. One of the roles of Zookeeper is electing controller. (one of the brokers which is responsible for maintaining the leader/follower relationship for all the partitions) For election; majority of Zookeper nodes must be alive. In your case even one Zookeeper instance is down, you cannot select controller. So there is no difference between having one Zookeper or two. That's why it is recommended to have at least 3 nodes in Zookeeper cluster. By this way you can handle failure of one Zookeeper node.
An addition to this, it is highly recommended to have at least three brokers in your Kafka cluster to maintain both consistency and high availability. (link1, link2)
UPDATE:
As long as you are limited to only two servers, then you can consider sacrificing from high availability by set up your broker by setting min.insync.replicas=2 and having topics with replication.factor=2. If HA is more important than data loss, then you can use min.insync.replicas=1 (default) broker config with again topic replication.factor=2. In this circumstance, your options are these IMHO. (Having one or two Zookeepers is not important as I mentioned above)
I am often faced with the same problem as you do #frisky5 where i would like to achieve a "suboptimal" HA system using only 2 nodes, and thus workarounds are always needed with cloud-native frameworks that rely on the assumption that clusters will have lot of nodes available.
That ain't always the case in real life, is it ;) ?
That being said, i see you essentially having 2 options:
Externalize zookeeper configuration on a replicated storage system using 2 nodes (e.g. DRBD)
Replicate Kafka data volumes entirely on the second nodes and use 2 one-node Kafka clusters that you switch on and off depending on who is the current master node.
I would go for the first option. In that case you would have 2 Kafka servers and one zookeeper server whose ip needs to be static (virtual ip). When the zookeeper node goes down, it is restarted one the second node with same VIP, but it needs to access the synchronized data folder.
I am not too familiar with zookeepers internals and i can't tell you whether it will go in conflict when starting up on a data store who "wasn't its own" but i would guess it makes sense for you to test it using a simple rsync setup.
Another way to achieve consensus if you are using a k3s based kubernetes cluster would be to rely on internal k8s distributed consensus mechanics to "tell Kafka" which node is the leader. This works for the postgresoperator by chruncydata because Patroni is cool ( https://patroni.readthedocs.io/en/latest/kubernetes.html ) 😎 but i am not sure if Kafka/zookeeper are that flexible and can communicate with a rest API to set their locks ...
Once you have achieved this intermediate step, then you can use a PostgreSQL db as external source of truth for k3s and then it is as simple as syncing the postgres data folder between the machines (easily done with rsync). The beauty of this approach is that it is way more generic and could be used for other systems too.
Let me know what do you think about these two approaches and whether you manage to setup a test environment. If you do on GitHub i can help you out with implementation

KSQL Server Elastic Scaling in Kubernetes

in the context of kubernetes or else, does it make sense to have one KSQL SERVER per application? When i read the capacity planning for KSQL Server, it is seems the basic settings are for running multiple queries on one server.
However I feel like to have a better control over scaling up and down with Kubernetes, it would make more sense to fix the number of Thread by per query, and launch a server configured in kube with let say 1 cpu, where only one application would run. However i am not sure how heavy are KSQL Server, and if that make actual sense or not.
Any recommendation.
First of all, what you have mentioned is clearly doable. You can run KSQL Server with Docker, so it's you could have a container orchestrator such as kubernetes or swarm maintaining and scheduling those KSQL Server instances.
So you know how this would play out:
Each KSQL Instance will join a group of other KSQL Instances with
the same KSQL_SERVICE_ID that use the same Kafka Cluster defined by KSQL_KSQL_STREAMS_BOOTSTRAP_SERVERS
You can create several KSQL Server Clusters, i.e for different
applications, just use different KSQL_SERVICE_ID while using the
same Kafka Cluster.
As a result, you now you have:
Multiple Containerized KSQL Server Instances managed by a container
orchestrator such as Kubernetes.
All of the KSQL Instances are connected to the Same Kafka Cluster (you can also have different Kafka Clusters for different KSQL_SERVICE_ID)
The KSQL Server Instances can be grouped in different applications
(different KSQL_SERVICE_ID) in order to achieve separation of
concerns so that scalability, security and availability can be
better maintained.
Regarding the coexistence of several KSQL Server Instances (maybe with different KSQL_SERVICE_ID) on the same server, you should know the available machine resources can be monopolized by a greedy instance, causing problems to the less greedy instance. With Kubernetes you could set resource limits on your Pods to avoid this, but greedy instances will be limited and slowed down.
Confluent advice regarding multi-tenancy:
We recommend against using KSQL in a multi-tenant fashion. For
example, if you have two KSQL applications running on the same node,
and one is greedy, you're likely to encounter resource issues related
to multi-tenancy. We recommend using a single pool of KSQL Server
instances per use case. You should deploy separate applications onto
separate KSQL nodes, because it becomes easier to reason about scaling
and resource utilization. Also, deploying per use case makes it easier
to reason about failovers and replication.
A possible drawback is the overhead you'll have if you run multiple KSQL Server Instances (Java Application footprint) in the same pool while having no work for them to do (i.e: no schedulable tasks due to lack of partitions on your topic(s)) or simply because you have very little workload. You might be doing the same job with less instances, avoiding idled or nearly-idled instances.
Of course stuffing all stream processing, maybe for completely different use cases or projects, on a single KSQL Server or pool of KSQL Servers may bring its own internal concurrency issues, development cycle complexities, management, etc..
I guess something in the middle will work fine. Use a pool of KSQL Server instances for a single project or use case, which in turn might translate to a pipeline consisting on a topology of several source, process and sinks, implemented by a number of KSQL queries.
Also, don't forget about the scaling mechanisms of Kafka, Kafka Streams and KSQL (built on top of Kafka Streams) discussed in the previous question you've posted.
All of this mechanisms can be found here:
https://docs.confluent.io/current/ksql/docs/capacity-planning.html
https://docs.confluent.io/current/ksql/docs/concepts/ksql-architecture.html
https://docs.confluent.io/current/ksql/docs/installation/install-ksql-with-docker.html

Who are clients of ZooKeeper?

Just started reading the Documentation of Zookeeper. Read that zk has servers ( followers + leader) and clients. Who actually are the clients of zk ? The nodes of distributed system that it co-ordinates ?
Also read that
ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
Does this means that znodes are thousands in numbers ? And what kind of read and write do we want on zk ?
Who actually are the clients of zk ?
A client is any process that connects to the ZooKeeper ensemble using the ZooKeeper client API. Apache ZooKeeper ships with API bindings for Java and C. More information about the Java API is available in the JavaDocs and examples and recipes.
ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
Does this means that znodes are thousands in numbers ?
The "thousands" here refers to the number of machines running ZooKeeper, not the number of znodes stored in the ZooKeeper ensemble. A znode refers to a node stored within the ZooKeeper cluster's hierarchy of data, similar to the concept of an inode in a tradtional file system.
And what kind of read and write do we want on zk ?
Reads refer to operations that get data from znodes or set watches to be informed when changes are applied to znodes. Writes refer to operations that create new znodes, delete existing znodes, or change data attached to znodes.
Reading through the API docs, examples and recipes should shed more light on all of this.