Are Kubernetes liveness probe failures voluntary or involuntary disruptions? - kubernetes

I have an application deployed to Kubernetes that depends on an outside application. Sometimes the connection between these 2 goes to an invalid state, and that can only be fixed by restarting my application.
To do automatic restarts, I have configured a liveness probe that will verify the connection.
This has been working great, however, I'm afraid that if that outside application goes down (such that the connection error isn't just due to an invalid pod state), all of my pods will immediately restart, and my application will become completely unavailable. I want it to remain running so that functionality not depending on the bad service can continue.
I'm wondering if a pod disruption budget would prevent this scenario, as it limits the # of pods down due to a "voluntary" disruption. However, the K8s docs don't state whether liveness probe failure are a voluntary disruption. Are they?

I would say, accordingly to the documentation:
Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
a hardware failure of the physical machine backing the node
cluster administrator deletes VM (instance) by mistake
cloud provider or hypervisor failure makes VM disappear
a kernel panic
the node disappears from the cluster due to cluster network partition
eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
deleting the deployment or other controller that manages the pod
updating a deployment's pod template causing a restart
directly deleting a pod (e.g. by accident)
Cluster administrator actions include:
Draining a node for repair or upgrade.
Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
Removing a pod from a node to permit something else to fit on that node.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions
So your example is quite different and according to my knowledge it's neither voluntary or involuntary disruption.
Also taking a look on another Kubernetes documentation:
Pod lifetime
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.
Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime
Container probes
The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a livenessProbe):
livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes
When should you use a liveness probe?
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe
According to those information it would be better to create custom liveness probe which should consider internal process health checks and external dependency(liveness) health check. In the first scenario your container should stop/terminate your process unlike the the second case with external dependency.
Answering following question:
I'm wondering if a pod disruption budget would prevent this scenario.
In this particular scenario PDB will not help.
I'd reckon giving more visibility to the comment, I've made with additional resources on the matter could prove useful to other community members:
Blog.risingstack.com: Designing microservices architecture for failure
Loft.sh: Blog: Kubernetes readiness probles examples common pitfalls: External depenedencies
Cloud.google.com: Archiecture: Scalable and resilient apps: Resilience designing to withstand failures

Testing with PodDisruptionBudget.
Pod will still restart at the same time.
example
https://github.com/AlphaWong/PodDisruptionBudgetAndPodProbe
So yes. like #Dawid Kruk u should create a customized script like following
# something like this
livenessProbe:
exec:
command:
- /bin/sh
- -c
# generate a random number for sleep
- 'SLEEP_TIME=$(shuf -i 2-40 -n 1);sleep $SLEEP_TIME; curl -L --max-time 5 -f nginx2.default.svc.cluster.local'
initialDelaySeconds: 10
# think about the gap between each call
periodSeconds: 30
# it is required after k8s v1.12
timeoutSeconds: 90

I'm wondering if a pod disruption budget would prevent this scenario.
Yes, it will prevent.
As you stated, when the pod goes down (or node failure) nothing can do pods from becoming unavailable. However, Certain services require that a minimum number of pods always keep running always.
There could be another way (Stateful resource) but it’s one of the simplest Kubernetes resources available.
Note: You can also use a percentage instead of an absolute number in the minAvailable field. For example, you could state that 60% of all pods with the app=run-always label need to be running at all times.

Related

Kubernetes StatefulSets and livenessProbes

Liveness probes are supposed to trigger a restart of failed containers. Do they respect the default stateful set deployment and scaling guarantees. E.g. if the liveness probe fails at the same time for multiple pods within one and the same stateful set, would K8S attempt to restart one container at a time or all in parallel?
According to https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ the liveness probes are a feature implemented in the kubelet:
The kubelet uses liveness probes to know when to restart a container.
This means any decision about scheduling that requires knowledge of multiple pods is not taken into account.
Therefore if all your statefulset's pods have failing liveness probes at the same time they will be rescheduled at about the same time not respecting any deployment-level guarantees.

Controlling pods kubelet vs. controller in control plane

I'm a little confused, I've been ramping up on Kubernetes and I've been reading about all the different objects ReplicaSet, Deployment, Service, Pods etc.
In the documentation it mentions that the kubelet manages liveness and readiness checks which are defined in our ReplicaSet manifests.
Reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
If this is the case does the kubelet also manage the replicas? Or does that stay with the controller?
Or do I have it all wrong and it's the kubelet that is creating and managing all these resources on a pod?
Thanks in advance.
Basically kubelet is called "node agent" that runs on each node. It get notified through kube apiserver, then it start the container through container runtime, it works in terms of Pod Spec. It ensures the containers described in the Pod Specs are running and healthy.
The flow of kubelet tasks is like: kube apiserver <--> kubelet <--> CRI
To ensure whether the pod is running healthy it uses liveness probe, if it gets an error it restarts the pod.
kubelet does not maintain replicas, replicas are maintained by replicaset. As k8s doc said: A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
See more of ReplicaSet
For more info you can see: kubelet
When starting your journey with Kubernetes it is important to understand its main components for both Control Planes and Worker Nodes.
Based on your question we will focus on two of them:
kube-controller-manager:
Logically, each controller is a separate process, but to reduce
complexity, they are all compiled into a single binary and run in a
single process.
Some types of these controllers are:
Node controller: Responsible for noticing and responding when nodes go down.
Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
kubelet:
An agent that runs on each node in the cluster. It makes sure that
containers are running in a Pod.
The kubelet takes a set of PodSpecs that are provided through various
mechanisms and ensures that the containers described in those PodSpecs
are running and healthy. The kubelet doesn't manage containers which
were not created by Kubernetes.
So answering your question:
If this is the case does the kubelet also manage the replicas? Or does
that stay with the controller?
No, replication can be managed by the Replication Controller, a ReplicaSet or a more recommended Deployment. Kubelet runs on Nodes and makes sure that the Pods are running according too their PodSpecs.
You can find synopsis for kubelet and kube-controller-manager in the linked docs.
EDIT:
There is one exception however in a form of Static Pods:
Static Pods are managed directly by the kubelet daemon on a specific
node, without the API server observing them. Unlike Pods that are
managed by the control plane (for example, a Deployment); instead, the
kubelet watches each static Pod (and restarts it if it fails).
Note that it does not apply to multiple replicas.

Kubernetes scaling pods by number of active connections

I have a kubernetes cluster that runs some legacy containers (windows containers) .
To simplify , let's say that the container can handle max 5 requests at a time something like
handleRequest(){
requestLock(semaphore_Of_5)
sleep(2s)
return "result"
}
So the cpu is not spiked . I need to scale based on nr of active connections
I can see from the documentation https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
You can use Pod readiness probes to verify that backend Pods are working OK, so that kube-proxy in iptables mode only sees backends that test out as healthy. Doing this means you avoid having traffic sent via kube-proxy to a Pod that’s known to have failed.
So there is a mechanism to make pods available for routing new requests but it is the livenessProbe that actually mark the pod as unhealthy and subject to restart policy. But my pods are just busy. They don't need restarting.
How can I increase the nr of pods in this case ?
You can enable HPA for the deployment.
You can autoscale on the no of requests metrics and perform autoscaling on this metric.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
I would also recommend to configure liveness probe failureThreshold and timeoutSeconds, check if it helps.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes

How to determine a failed kubernetes deployment?

I create a Pod with Replica count of say 2, which runs an application ( a simple web-server), basically it's always running command - However due to mis-configuration, sometimes the command exits and the pod is then Terminated.
Due to default restartPolicy of Always the pod (and hence the container) is restarted and eventually the Pod status is CrashLoopBackOff.
If I do a kubectl describe deployment, it shows Condition as Progressing=True and Available=False.
This looks fine - the question is - how do I mark my deployment as 'failed' in the above case?
Adding a spec.ProgressDeadlineSeconds doesn't seem to be having an effect.
Will simply saying restartPolicy as Never be enough in the Pod specification?
A related question, is there a way of getting this information as a trigger/webhook, without doing a rollout status watch?
A bit of theory
Regarding your question:
How do I mark my deployment as 'failed' in the above case?
Kubernetes gives you two types of health checks:
1 ) Readiness
Readiness probes are designed to let Kubernetes know when your app is ready to serve traffic.
Kubernetes makes sure the readiness probe passes before allowing a service to send traffic to the pod.
If a readiness probe starts to fail, Kubernetes stops sending traffic to the pod until it passes.
2 ) Liveness
Liveness probes let Kubernetes know if your app is alive or dead.
If you app is alive, then Kubernetes leaves it alone. If your app is dead, Kubernetes removes the Pod and starts a new one to replace it.
At the moment (v1.19.0) , Kubernetes has support for 3 types mechanisms for implementing liveness and readiness probes:
A ) ExecAction: Executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0.
B ) TCPSocketAction: Performs a TCP check against the Pod's IP address on a specified port. The diagnostic is considered successful if the port is open.
C ) HTTPGetAction: Performs an HTTP GET request against the Pod's IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
In your case:
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
I think that in your case (the need to refer to a deployment as succeed / failed and take the proper action) you should:
Step 1:
Setup a HTTP/TCP readiness Probe - for example:
readinessProbe:
httpGet:
path: /health-check
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Where:
initialDelaySeconds — The number of seconds since the container has started before the readiness probe can be initiated.
periodSeconds — How often to perform the readiness probe.
failureThreshold — The number of tries to perform the readiness probe if the probe fails on pod start.
Step 2:
Choose the relevant rolling update strategy and how you should handle cases of failures of new pods (consider reading this thread for examples).
A few references you can follow:
Container probes
Kubernetes Liveness and Readiness Probes
Kubernetes : Configure Liveness and Readiness Probes
Kubernetes and Containers Best Practices - Health Probes
Creating Liveness Probes for your Node.js application in Kubernetes
A Failed Deployment
A deployment (or the rollout process) will be considered as Failed
if it tries to deploy its newest ReplicaSet without ever completing over and over again until the progressDeadlineSeconds interval has exceeded.
Then K8S you update the status with:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
ReplicaFailure True FailedCreate
Read more in here.
There is no Kubernetes concept for a "failed" deployment. Editing a deployment registers your intent that the new ReplicaSet is to be created, and k8s will repeatedly try to make that intent happen. Any errors that are hit along the way will cause the rollout to block, but they will not cause k8s to abort the deployment.
AFAIK, the best you can do (as of 1.9) is to apply a deadline to the Deployment, which will add a Condition that you can detect when a deployment gets stuck; see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment and https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds.
It's possible to overlay your own definitions of failure on top of the statuses that k8s provides, but this is quite difficult to do in a generic way; see this issue for a (long!) discussion on the current status of this: https://github.com/kubernetes/kubernetes/issues/1899
Here's some Python code (using pykube) that I wrote a while ago that implements my own definition of ready; I abort my deploy script if this condition does not obtain after 5 minutes.
def _is_deployment_ready(d, deployment):
if not deployment.ready:
_log.debug('Deployment not completed.')
return False
if deployment.obj["status"]["replicas"] > deployment.replicas:
_log.debug('Old replicas not terminated.')
return False
selector = deployment.obj['spec']['selector']['matchLabels']
pods = Pod.objects(d.api).filter(namespace=d.namespace, selector=selector)
if not pods:
_log.info('No pods found.')
return False
for pod in pods:
_log.info('Is pod %s ready? %s.', pod.name, pod.ready)
if not pod.ready:
_log.debug('Pod status: %s', pod.obj['status'])
return False
_log.info('All pods ready.')
return True
Note the individual pod check, which is required because a deployment seems to be deemed 'ready' when the rollout completes (i.e. all pods are created), not when all of the pods are ready.

How to kickoff the dead replicas of Kubernetes Deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)