Smooth load rebalancing for Kubernetes HPA - kubernetes

I have configured my ingress controller with nginx-ingress hashing and I define HPA for my deployments. When we do load testing we hit a problem on the newly created pods that aren't warmed up enough and while the load balancing shifts immediately target portion of the traffic the latency spikes and service is choking. Is there a way to define some smooth load rebalancing that would rather move the traffic gradually and thus warm up the service in more natural way ?
Here is an example effect we see now:

At glance I see 2 possible reasons for that behaviour:
I think there is a chance that you are facing the same problem as encountered in this question: Some requests fails during autoscaling in kubernetes. In that case, Nginx was sending requests to Pods that were not completely ready. To solve this you can configure a Readiness Probe. Personally, I configure my Readiness Probes to send a http request to a /health endpoint of my services.
There is a chance however that your application naturally performs slowly during the first requests, usually because of caching or some other operation that needs to be done at the beginning of its life. I encountered this problem in a Django+Gunicorn app where the Gunicorn only started my app after the first request. To solve this I used a PostStart Container Hook which sends a request to my app right after the container is created. Here is an example of its use. You may also have a look at this question: Kubernetes Pod warm-up for load balancing.

Related

k8s container initialization and load balancing

I have a deployment with one pod with my custom image. After executing kubectl create -f deployment.yaml, this pod becomes running. I see that everything is fine and it has "running" state in kubectl's output. But, i have one initialization script to start Apache Tomcat, it takes around 40-45 seconds to execute it and up server inside.
I also have load balancer deployment with nginx. Nginx redirects incoming requests to Apache Tomcat via proxy_pass. When i scale my deployment for 2 replicas and shut down one of them, sometimes application becomes stuck and freezing.
I feel that load balancing by k8s works not correctly, k8s is trying to use pod, which is initializing by script right now.
How can i tell k8s that pod in deployment hasn't been initialized and not to use it until it becomes totally up?
If I understand correctly mostly your problem is related to the application not being ready to accept requests because your initialization script hasn’t finished.
For that situation, you can easily setup different types of probes, such as liveliness and readiness. Such a solution would be useful, as your application wouldn’t be considered ready to accept requests unless the whole pod would start up and signal that it is alive.
Here you can read more about it: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Load balancing onto replicas of pods

We have an AKS cluster and we want to achieve below two points in our architecture:
We have replicas of pods and we want to have only 1 request served by one pod. basically one pod - one request design.
When all pods are busy, then next coming request should not be queued at POD level, instead it should be queued at service level and once any of busy pod become idle or available then only queued request should be dispatched on idle pod.
How to achieve above things?
Generally, this could be achieved by creating a custom proxy that creates pods on demand, but in practice it will be very difficult and performance will be poor. This was very well explained by David Maze in his comment:
You need to write a custom proxy with access to the Kubernetes API that can create new pods on demand; this is not a standard Kubernetes setup. This is also an extremely heavy-weight setup (if it takes tens of seconds to pull and deploy a new pod you can hit HTTP request timeouts very easily) and every Web framework supports handling multiple requests per process.

How to signal "bad" but not "fatal" health check from spring boot to Kubernetes?

What we're looking for is a way for an actuator health check to signal some intention like "I am limping but not dead. If there are X number of other pods claiming to be healthy, then you should restart me, otherwise, let me limp."
We have a rest service hosted in clustered Kubernetes containers that periodically call out to fetch fresh data from an external resource. Occasionally we have failures reaching those external resources, and sometimes, but not every time, a restart of the pod will resolve the issue.
The services can operate just fine on possibly stale data. Although we wouldn't want to continue operating on stale data, that's preferable to just going down entirely.
In the interim, we're planning on having a node unilaterally decide not to report any problems through actuator until X amount of time has passed since the last successful sync, but that really only delays the point at which all nodes would still report failure.
In Kubernetes you can use LivenessProbe and ReadinessProbe to let a controller to heal your service, but some situations is better handled with HTTP response codes or alternative degraded service.
LivenessPobe
Use a LivenessProbe to resolve a deadlock situation. When your pod does not respond on a LivenessProbe, it will be killed and a new pod will replace it.
ReadinessProbe
Use a ReadinessProbe when your pod is not prepared for serving requests, e.g. if your pod need to read some files or need to connect to an external service before serving requests.
Fault affecting all replicas
If you have a problem that all your replicas depends on, e.g. an external service is down, then you can not solve it by restarting your pods. You may use an OpsToogle or a circuit breaker in this situation and notifying other services that you are degraded or show a message about temporary error.
For your situations
If there are X number of other pods claiming to be healthy, then you should restart me, otherwise, let me limp.
You can not delegate that logic to Kubernetes. Your application need to understand each fault situation, e.g. if an error was a transient network error or if your error will affect all replicas.

Kubernetes HPA and Scaling Down

I have a kubernetes HPA set up in my cluster, and it works as expected scaling up and down instances of pods as the cpu/memory increases and decreases.
The only thing is that my pods handle web requests, so it occasionally scales down a pod that's in the process of handling a web request. The web server never gets a response back from the pod that was scaled down and thus the caller of the web api gets an error back.
This all makes sense theoretically. My question is does anyone know of a best practice way to handle this? Is there some way I can wait until all requests are processed before scaling down? Or some other way to ensure that requests complete before HPA scales down the pod?
I can think of a few solutions, none of which I like:
Add retry mechanism to the caller and just leave the cluster as is.
Don't use HPA for web request pods (seems like it defeats the purpose).
Try to create some sort of custom metric and see if I can get that metric into Kubernetes (e.x https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics)
Any suggestions would be appreciated. Thanks in advance!
Graceful shutdown of pods
You must design your apps to support graceful shutdown. First your pod will receive a SIGTERM signal and after 30 seconds (can be configured) your pod will receive a SIGKILL signal and be removed. See Termination of pods
SIGTERM: When your app receives termination signal, your pod will not receive new requests but you should try to fulfill responses of already received requests.
Design for idempotency
Your apps should also be designed for idempotency so you can safely retry failed requests.

What happens when the Kubernetes master fails?

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?
In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).
Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.
Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.