How to deploy Kafka Stream applications on Kubernetes? - apache-kafka

My application has some aggregation/window operation, so it has some state store which stores in the state.dir. AFAIK, it also writes the changelog of state store to the broker,
so is that OK to consider the Kafka Stream application as a stateless POD?

My application has some aggregation/window operation, so it has some state store which stores in the state.dir. AFAIK, it also writes the changelog of state store to the broker, so is that OK to consider the Kafka Stream application as a stateless POD?
Stateless pod and data safety (= no data loss): Yes, you can consider the application as a stateless pod as far as data safety is concerned; i.e. regardless of what happens to the pod Kafka and Kafka Streams guarantee that you will not lose data (and if you have enabled exactly-once processing, they will also guarantee the latter).
That's because, as you already said, state changes in your application are always continuously backed up to Kafka (brokers) via changelogs of the respective state stores -- unless you explicitly disabled this changelog functionality (it is enabled by default).
Note: The above is even true when you are not using Kafka's Streams default storage engine (RocksDB) but the alternative in-memory storage engine. Many people don't realize this because they read "in-memory" and (falsely) conclude "data will be lost when a machine crashes, restarts, etc.".
Stateless pod and application restoration/recovery time: The above being said, you should understand how having vs. not-having local state available after pod restarts will impact restoration/recovery time of your application (or rather: application instance) until it is fully operational again.
Imagine that one instance of your stateful application runs on a machine. It will store its local state under state.dir, and it will also continuously backup any changes to its local state to the remote Kafka cluster (brokers).
If the app instance is being restarted and does not have access to its previous state.dir (probably because it is restarted on a different machine), it will fully reconstruct its state by restoring from the associated changelog(s) in Kafka. Depending on the size of your state this may take milliseconds, seconds, minutes, or more. Only once its state is fully restored it will begin processing new data.
If the app instance is being restarted and does have access to its previous state.dir (probably because it is restarted on the same, original machine), it can recover much more quickly because it can re-use all or most of the existing local state, so only a small delta needs to restored from the associated changelog(s). Only once its state is fully restored it will begin processing new data.
In other words, if your application is able to re-use existing local state then this is good because it will minimize application recovery time.
Standby replicas to the rescue in stateless environments: But even if you are running stateless pods you have options to minimize application recovery times by configuring your application to use standby replicas via the num.standby.replicas setting:
num.standby.replicas
The number of standby replicas. Standby replicas are shadow copies of local state stores. Kafka Streams attempts to create the specified number of replicas and keep them up to date as long as there are enough instances running. Standby replicas are used to minimize the latency of task failover. A task that was previously running on a failed instance is preferred to restart on an instance that has standby replicas so that the local state store restoration process from its changelog can be minimized.
See also the documentation section State restoration during workload rebalance
Update 2018-08-29: Arguably the most convenient option to run Kafka/Kafka Streams/KSQL on Kubernetes is to use Confluent Operator or the Helm Charts provided by Confluent, see https://www.confluent.io/confluent-operator/. (Disclaimer: I work for Confluent.)
Update 2019-01-10: There's also a Youtube video that demoes how to Scale Kafka Streams with Kubernetes.

I think so. The RocksDB is there for saving state in order to be fast when it comes to execute operations which need the state itself. As you already mentioned, the state changes are stored in a Kafka topic as well, so that if the current streams application instance fails, another instance (on another node) can use the topic to re-build the local state and continue to process the stream as the previous one.

KStreams uses the underlying state.dir for local storage. If the pod get's restarted on the same machine, and the volume is mounted, it will pick up from where it was, immediately.
If the pod starts up in another machine where the local state is not available, KStreams will rebuild the state via re-reading the backing Kafka topics
A short video at https://www.youtube.com/watch?v=oikZg7_vy6A shows Lenses - for Apache Kafka - deploying and scaling KStream applications on Kubernetes

Related

ZooKeeper ZNode connection limit

I am basically making use of the togglz zookeeper integration for managing feature flags. The feature flags are stored in zookeeper as child znodes under /mycompany/features.
This integration basically makes use of Tree Cache internally to stay eventually consistent with ZooKeeper changes.
This class on github can add more clarity on the implementation specifics.
We run on a 5 node ZooKeeper ensemble.
We have around 100 microservices, and each can have 5 instances.
Since every microservice instance leverages the tree cache, we are essentially looking at 500 instances all targetting the ZNode /mycompany/features and its child ZNodes.
I am trying to find out the following:
Would this setup cause performance bottlenecks.
If yes, then what can/should be done to circumvent performance bottlenecks.

Prevent data loss while upgrading Kafka with a single broker

I have a Kafka server which runs on a single node. There is only 1 node because it's a test server. But even for a test server, I need to be sure that no data loss will occur while upgrade is in process.
I upgrade Kafka as:
Stop Kafka, Zookeeper, Kafka Connect and Schema Registry.
Upgrade all the components.
Start upgraded services.
Data loss may occur in the first step, where kafka is not running. I guess you can do a rolling update (?) with multiple brokers to prevent data loss but in my case it is not possible. How can I do something similar with a single broker? Is it possible? If not, what is the best approach for upgrading?
I have to say, obviously, you are always vulnerable to data losses if you are using only one node.
If you can't have more nodes you have the only choice:
Stop producing;
Stop consuming;
Enable parameter controlled.shutdown.enable - this will ensure that your broker saved offset in case of a shutdown.
I guess the first 2 steps are quite tricky.
Unfortunately, there is not much to play with - Kafka was not designed to be fault-tolerant with only one node.
The process of a rolling upgrade is still the same for a single broker.
Existing data during the upgrade shouldn't be lost.
Obviously, if producers are still running, all their requests will be denied while the broker is down, thus why you not only need multiple brokers to prevent data-loss, but a balanced cluster (with unclean leader election disabled) where your restart cycles don't completely take a set of topics offline.

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

During rolling upgrade/restart, how to detect when a kafka broker is "done"?

I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).
What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.
Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...
I would suggest exposing JMX metrics and tracking the following for cluster health
the controller count (must be 1 over the whole cluster)
under replicated partitions (should be zero for healthy cluster)
unclean leader elections (if you don't disable this in server.properties make sure there are none in the metric counts)
ISR shrinks within a reasonable time period, like 10 minute window (should be none)
Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true
Assuming your cluster was healthy at the beginning of the restart operation, at a minimum, after each broker restart, you should ensure that the under-replicated partition count returns to zero before restarting the next broker.
As the previous responders mentioned, there is existing code out there to automate this. I don’t use Jolikia, myself, but my solution (which I’m working on now) also uses JMX metrics.
Kakfa Utils by Yelp is one of the best tools that can be used to detect when a kafka broker is "done". Specifically, kafka_rolling_restart is the tool which gets broker details from zookeeper and URP (Under Replicated Partitions) metrics from each broker. When a broker is restarted, total URPs across Kafka cluster is periodically collected and when it goes to zero, it restarts another broker. The controller broker is restarted at the last.

Scaling Kafka stream application across multiple users

I have a setup where I'm pushing events to kafka and then running a Kafka Streams application on the same cluster. Is it fair to say that the only way to scale the Kafka Streams application is to scale the kafka cluster itself by adding nodes or increasing Partitions?
In that case, how do I ensure that my consumers will not bring down the cluster and ensure that the critical pipelines are always "on". Is there any concept of Topology Priority which can avoid a possible downtime? I want to be able to expose the streams for anyone to build applications on without compromising the core pipelines. If the solution is to setup another kafka cluster, does it make more sense to use Apache storm instead, for all the adhoc queries? (I understand that a lot of consumers could still cause issues with the kafka cluster, but at least the topology processing is isolated now)
It is not recommended to run your Streams application on the same servers as your brokers (even if this is technically possible). Kafka's Streams API offers an application-based approach -- not a cluster-based approach -- because it's a library and not a framework.
It is not required to scale your Kafka cluster to scale your Streams application. In general, the parallelism of a Streams application is limited by the number of partitions of your app's input topics. It is recommended to over-partition your topic (the overhead for this is rather small) to guard against scaling limitations.
Thus, it is even simpler to "offer anyone to build applications" as everyone owns their application. There is no need to submit apps to a cluster. They can be executed anywhere you like (thus, each team can deploy their Streams application the same way by which they deploy any other application they have). Thus, you have many deployment options from a WAR file, over YARN/Mesos, to containers (like Kubernetes). Whatever works best for you.
Even if frameworks like Flink, Storm, or Samza offer cluster management, you can only use such tools that are integrated with those frameworks (for example, Samza requires YARN -- no other options available). Let's say you have already a Mesos setup, you can reuse it for your Kafka Streams applications -- no need for a dedicated "Kafka Streams cluster" (because there is no such thing).
An application’s processor topology is scaled by breaking it into
multiple tasks.
More specifically, Kafka Streams creates a fixed number of tasks based
on the input stream partitions for the application, with each task
assigned a list of partitions from the input streams (i.e., Kafka
topics).
The assignment of partitions to tasks never changes so that each task
is a fixed unit of parallelism of the application. Tasks can then
instantiate their own processor topology based on the assigned
partitions; they also maintain a buffer for each of its assigned
partitions and process messages one-at-a-time from these record
buffers.
As a result stream tasks can be processed independently and in
parallel without manual intervention.
It is important to understand that Kafka Streams is not a resource
manager, but a library that “runs” anywhere its stream processing
application runs. Multiple instances of the application are executed
either on the same machine, or spread across multiple machines and
tasks can be distributed automatically by the library to those running
application instances.
The assignment of partitions to tasks never changes; if an application
instance fails, all its assigned tasks will be restarted on other
instances and continue to consume from the same stream partitions.
The processing of the stream happens in the machines where the application is running.
I recommend you to have a look to this guide, it can help you to better understand the way Kafka Streams work.