Is it possible to use Litmus to test kafka?
basically some test to figure out what happens in various scenarios. Initial thoughts are:
deploying confluent helm chart for kafka and checking brokers gets Storage bound successfully
Kill broker and ensure it comes back
kill zookeeper pod and ensure it comes back
kill consumer pods (my own apps) and ensure they come a back and doesn't miss message
kill producer pod and ensure it comes back and ensure all messages get sent.
I assume that you mean a Litmus test as in a mini chaos engineering test and I'm also assuming that you mean to do in Kubernetes. Yes, you should be able to do as long as you have all your pods defined in a higher level abstraction like a Deployment, DaemonSet, and StatefulSet that inherently creates a ReplicaSet.
In the Kubernetes world to force this test you can just delete the pod where your Kafka/Zookeeper workload is running and they should be brought up by the ReplicaSet on to of your pods.
Related
I have a Kafka cluster hosted in GKE. Google updates GKE nodes in weekly basis and whenever this happens Kafka becomes temporarily unavailable and this causes massive error/rebalance to get it backup to healthy state. Currently we rely on K8 retry to eventually succeed once upgrade completes and cluster becomes available. Is there a way to gracefully handle this type of situation in Kafka or avoid it if possible?
In order to be able to inform you better, you would have to give us a little more information, what is your setup? Versions of Kube and Kafka? How many Kafka & ZK pods? How are you deploying your Kafka cluster (via a simple helm chart or an operator?) What are the exact symptoms you see when you upgrade your kube cluster? What errors do you get? What is the state of the Kafka cluster etc.? How do you monitor it?
But here are some points worth investigating.
Are you spreading the Kafka/ZK pods correctly across the nodes/zones?
Do you set PDBs to a reasonable maxUnavailable setting?
What are your readiness/liveness probes for your Kafka/ZK pods?
Are your topics correctly replciated?
I would strongly encourage you to use take a look at https://strimzi.io/ which can be very helpful if you want to operate Kafka on Kube. It is open source operator and very well documented.
You have control over the GKE Node's auto upgrade through "upgrade maintenance window" to decide when upgrades should occur. Based on your business criticality you can configure this option along with K8 retry feature.
Kubernetes tends to assume apps are small/lightweight/stateless microservices which can be stopped on one node and restarted on another node with no downtime.
We have a slow starting (20min) legacy (stateful) application which, once run as a set of pod should not be rescheduled without due cause. The reason being all user sessions will be killed and the users will have to login again. There is NO way to serialize the sessions and externalize them. We want 3 instances of the pod.
Can we tell k8s not to move a pod unless absolutely necessary (i.e. it dies)?
Additional information:
The app is a tomcat/java monolith
Assume for the sake of argument we would like to run it in Kubernetes
We do have a liveness test endpoint available
There is no benefit, if you tell k8s to use only one pod. That is not the "spirit" of k8s. In this case, it might be better to use a dedicated machine for your app.
But you can assign a pod to a special node - Assigning Pods to Nodes. The should be necessary only, when special hardware requirements are needed (e.g. the AI-microservice needs a GPU, which is only on node xy).
k8s don't restart your pod for fun. It will restart it, when there is a reason (node died, app died, ...) and I never noticed a "random reschedule" in a cluster. It is hard to say, without any further information (like deployment, logs, cluster) what exactly happened to you.
And for your comment: There are different types of recreation, one of them starts a fresh instance and will kill the old one, when the startup was successfully. Look here: Kubernetes deployment strategies
All points together:
Don't enforce a node to your app - k8s will "smart" select the node.
There are normally no planned reschedules in k8s.
k8s will recreate pods only, if there is a reason. Maybe your app didn't answer on the liveness-endpoint? Or someone/something deleting your pod?
I'm planning to deploy a WebRTC custom videoconference software (based on NodeJS, using websockets) with Kubernetes, but I have some doubts about scaling down this environment.
Actually, I'm planning to use cloud hosted Kubernetes (GKE, EKS, AKS or any) to be able to auto-scale nodes in the cluster to attend the demand increase and decrease. But, scaling up is not the problem, but it's about scaling down.
The cluster will scale down based on some CPU average usage metrics across the cluster, as I understand, and if it tries to remove some node, it will start to drain connections and stop receiving new connections, right? But now, imagine that there's a videoconference still running in this "pending deletion" node. There are two problems:
1 - Stopping the node before the videoconference finishes (it will drop the meeting)
2 - With the draining behaviour when it starts to scale down, it will stop receiving new connections, so if someone tries to join in this running video conference, it will receive a timeout, right?
So, which is the best strategy to scale down nodes for a video conference solution? Any ideas?
Thanks
I would say this is not a matter of resolving it on kubernetes level by some specific scaling strategy but rather application ability to handle such situations. It isn't even specific to kubernetes. Imagine that you deploy it directly on compute instances which are also subject to autoscale and you'll end up in exactly the same situation when the load decreases and one of the instances is removed from the set.
You should rather ask yourself if such application is suitable to be deployed as kubernetes workload. I can imagine that such videoconference session doesn't have to rely on the backend deployed on a single node only. You can even define some affinity or anti-affinity rules to prevent your Pods from being scheduled on the same node. So if the whole application cluster is still up and running (it's Pods are running on different nodes), eviction of a limited subset of Pods should not have a big impact.
You can actually face the same issue with any other application as vast majority of them base on some session which needs to be established between the client software and the server part. I would say it's application responsibility to be able to handle such scenarios. If some of the users unexpectedly loses the connection it should be possible to immediately redirect them to the running instance e.g. different Pod which is still able to accept new requests.
So basically if the application is designed to be highly available, scaling in (when we talk about horizontal scaling we actually talk about scaling in and scaling out) the underyling VMs, or more specifically kubernetes nodes, shouldn't affect it's high availability capabilities. From the other hand if it is not designed to be highly available, solution such as kubernetes probably won't help much.
There is no best strategy at your use case. When a cloud provider scales down, it is going to get one node randomly and kill it. It's not going to check whether this node has less resource consumption, so let's kill this one. It might end up killing the node with most pods running on it.
I would focus on how you want to schedule your pods. I would try to schedule them, if possible, on a node with running pods already (Pod inter-affinity), and would set up a Pod Disruption Budget to all Deployments/StatefulSets/etc (depending on how you want to run the pods). As a result it would only scale down when there are no pods running on a specific node, and it would kill that node, because on the other nodes there are pods; protected by a PDB.
I am trying to achieve the exactly-once delivery on Kafka using Spring-Kafka on Kubernetes.
As far as I understood, the transactional-ID must be set on the producer and it should be the same across restarts, as stated here https://stackoverflow.com/a/52304789/3467733.
The problem arises using this semantic on Kubernetes. How can you get a consistent ID?
To solve this problem I implementend a Spring boot application, let's call it "Replicas counter" that checks, through the Kubernetes API, how many pods there are with the same name as the caller, so I have a counter for every pod replica.
For example, suppose I want to deploy a Pod, let's call it APP-1.
This app does the following:
It perfoms a GET to the Replicas-Counter passing the pod-name as parameter.
The replicas-counter calls the Kubernetes API in order to check how many pods there are with that pod name. So it does a a +1 and returns it to the caller. I also need to count not-ready pods (think about a first deploy, they couldn't get the ID if I wasn't checking for not-ready pods).
The APP-1 gets the id and will use it as the transactional-id
But, as you can see a problem could arise when performing rolling updates, for example:
Suppose we have 3 pods:
At the beginning we have:
app-1: transactional-id-1
app-2: transactional-id-2
app-3: transactional-id-3
So, during a rolling update we would have:
old-app-1: transactional-id-1
old-app-2: transactional-id-2
old-app-3: transactional-id-3
new-app-3: transactional-id-4 (Not ready, waiting to be ready)
New-app-3 goes ready, so Kubernetes brings down the Old-app-3. So time to continue the rolling update.
old-app-1: transactional-id-1
old-app-2: transactional-id-2
new-app-3: transactional-id-4
new-app-2: transactional-id-4 (Not ready, waiting to be ready)
As you can see now I have 2 pods with the same transactional-id.
As far as I understood, these IDs have to be the same across restarts and unique.
How can I implement something that gives me consistent IDs? Is there someone that have dealt with this problem?
The problem with these IDs are only for the Kubernetes Deployments, not for the Stateful-Sets, as they have a stable identifier as name. I don't want to convert all deployment to stateful sets to solve this problem as I think it is not the correct way to handle this scenario.
The only way to guarantee the uniqueness of Pods is to use StatefulSet.
StatefulSets will allow you to keep the number of replicas alive but everytime pod dies it will be replaced with the same host and configuration. That will prevent data loss that is required.
Service in Statefulset must be headless because since each pod is going to be unique, so you are going to need certain traffic to reach certain pods.
Every pod require a PVC (in order to store data and recreate whenever pod is deleted from that data).
Here is a great article describing why StatefulSet should be used in similar case.
I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?
In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).
Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.
Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.