How does a Liveness/Readiness probe communicate with a pod? - kubernetes

I am very new to k8s so apologies if the question doesn't make sense or is incorrect/stupid.
I have a liveness probe configured for my pod definition which just hits a health API and checks it's response status to test for the liveness of the pod.
My question is, while I understand the purpose of the liveness/readiness probes...what exactly are they? Are they just another type of pods which are spun up to try and communicate with our pod via the configured API? Or are they some kind of a lightweight process which runs inside the pod itself and attempts the API call?
Also, how does a probe communicate with a pod? Do we require a service to be configured for the pod so that the probe is able to access the API or is it an internal process with no additional config required?

Short answer: kubelet handle this checks to ensure your service is running, and if not it will be replaced by another container. Kubelet runs in every node of your cluster, you don't need to make any addional configurations.
You don't need to configure a service account to have the probes working, it is a internal process handled by kubernetes.
From Kubernetes documentation:
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.

For network probes, they are run from the kubelet on the node where the pod is running. Exec probes are run via the same mechanism as kubectl exec.

Related

In Kubernetes, why does readinessProbe have the option periodSeconds?

I understand what a readinessProbe does, but I don't see why it should have a periodSeconds. Once it's determined that the pod is ready, it should stop checking. Wouldn't checking periodically then be up to the livenessProbe? Or am I missing something?
ReadinesProbe and livenessProbe serve for different purposes.
ReadinessProbe checks if a service is ready to serve requests. If the readinessProbe fails, the container will be taken out of service for as long as the probe fails. If the readinessProbe reports up again, the container will be taken back into service again and receive requests.
In contrast, if the livenessProbe fails, it will be considered as not recoverable and the container will be terminated and restarted.
For both, periodSeconds makes sense. Even for the livenessProbe, when failure is considered only after X consecutive failed checks.
Readiness probes determine whether or not a container is ready to serve requests. If the readiness probe returns a failed state, then Kubernetes removes the IP address for the container from the endpoints of all Services.
We use readiness probes to instruct Kubernetes that a running container should not receive any traffic. This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
The readiness probe is configured in the spec.containers.readinessprobe attribute of the pod configuration.
This periodSeconds field specifies that the kubelet should perform a readiness probe for every “x” seconds that is mentioned in the yaml. This Specifies the frequency of the checks to the readiness probe to check. Default to 10 seconds. Minimum value is 1.

Hazelcast failes liveness probe in OpenShift when loading a large map

We have Hazelcast 4.2 on OpenShift deployed as a standalone cluster and stateful set. We use Mongo as a backing data store (it shouldn't matter) and the docker image is created with a copy of dockerfile from Github Hazelcast project with all the package repositories replaced with our internal company servers.
We have a MapLoader which takes a long time (30 minutes) to load all the data. During this load time the cluster fails to respond to liveness and readiness probes:
Readiness probe failed: Get http://xxxx:5701/hazelcast/health/node-state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
and so OpenShift kills the nodes that are loading the data.
Is there any way to fix it, so Hazelcast responds with "alive" and "not ready" instead of timing out the connection?
You can increase the time for POD to check the health of POD if you are loading data initially.
https://github.com/hazelcast/charts/blob/d3b8d118da400255effc81e67a13c1863fee5f41/stable/hazelcast/values.yaml#L120
Above is helm example line showing the readiness and liveness configuration.
You can change the initialDelaySeconds to wait and after that seconds only Hazelcast will start checking the health.
Accordingly you can also adjust the other probes configuration like failureThreshold: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
if initialDelaySeconds is not helpful, still you can increase the configuration for the readiness & liveness accordingly and increase the timeout whenever map loader loading the data.
Also only in liveness your POD or container get restart if Readiness failing K8s will mark your POD as not ready to accept traffic. So while loading data you can increase the liveness configuration so it never failed and set to max possible.
The kubelet uses liveness probes to know when to restart a container.
For example, liveness probes could catch a deadlock, where an
application is running, but unable to make progress. Restarting a
container in such a state can help to make the application more
available despite bugs.
The kubelet uses readiness probes to know when a container is ready to
start accepting traffic. A Pod is considered ready when all of its
containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
You can also set the restartPolicy to Never so the container will not restart.

Readiness probe failure, Kubernetes expected behavior

According to the Kubernetes documentation,
If the readiness probe fails, the endpoints controller removes the Pod's IP address from the endpoints of all Services that match the Pod.
So I understand that Kubernetes won't redirect requests to the pod when the readiness probe fails.
In addition, does Kubernetes kill the pod? Or does it keep calling the readiness probe until the response is successful?
What are readiness probes for?
Containers can use readiness probes to know whether the container being probed is ready to start receiving network traffic. If your container enters a state where it is still alive but cannot handle incoming network traffic (a common scenario during startup), you want the readiness probe to fail. That way, Kubernetes will not send network traffic to a container that isn't ready for it. If Kubernetes did prematurely send network traffic to the container, it could cause the load balancer (or router) to return a 502 error to the client and terminate the request; either that or the client would get a "connection refused" error message.
The name of the readiness probe conveys a semantic meaning. In effect, this probe answers the true-or-false question: "Is this container ready to receive network traffic?"
A readiness probe failing will not kill or restart the container.
What are liveness probes for?
A liveness probe sends a signal to Kubernetes that the container is either alive (passing) or dead (failing). If the container is alive, then Kubernetes does nothing because the current state is good. If the container is dead, then Kubernetes attempts to heal the application by restarting it.
The name liveness probe also expresses a semantic meaning. In effect, the probe answers the true-or-false question: "Is this container alive?"
A liveness probe failing will kill/restart a failed container.
What are startup probes for?
Kubernetes has a newer probe called startup probes. This probe is useful for applications that are slow to start. It is a better alternative to increasing initialDelaySeconds on readiness or liveness probes. A startup probe allows an application to become ready, joined with readiness and liveness probes, it can increase the applications' availability.
Once the startup probe has succeeded once, the liveness probe takes over to provide a fast response to container deadlocks. If the startup probe never succeeds, the container is killed after failureThreshold * periodSeconds (the total startup timeout) and will be killed and restarted, subject to the pod's restartPolicy.

How to determine a failed kubernetes deployment?

I create a Pod with Replica count of say 2, which runs an application ( a simple web-server), basically it's always running command - However due to mis-configuration, sometimes the command exits and the pod is then Terminated.
Due to default restartPolicy of Always the pod (and hence the container) is restarted and eventually the Pod status is CrashLoopBackOff.
If I do a kubectl describe deployment, it shows Condition as Progressing=True and Available=False.
This looks fine - the question is - how do I mark my deployment as 'failed' in the above case?
Adding a spec.ProgressDeadlineSeconds doesn't seem to be having an effect.
Will simply saying restartPolicy as Never be enough in the Pod specification?
A related question, is there a way of getting this information as a trigger/webhook, without doing a rollout status watch?
A bit of theory
Regarding your question:
How do I mark my deployment as 'failed' in the above case?
Kubernetes gives you two types of health checks:
1 ) Readiness
Readiness probes are designed to let Kubernetes know when your app is ready to serve traffic.
Kubernetes makes sure the readiness probe passes before allowing a service to send traffic to the pod.
If a readiness probe starts to fail, Kubernetes stops sending traffic to the pod until it passes.
2 ) Liveness
Liveness probes let Kubernetes know if your app is alive or dead.
If you app is alive, then Kubernetes leaves it alone. If your app is dead, Kubernetes removes the Pod and starts a new one to replace it.
At the moment (v1.19.0) , Kubernetes has support for 3 types mechanisms for implementing liveness and readiness probes:
A ) ExecAction: Executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0.
B ) TCPSocketAction: Performs a TCP check against the Pod's IP address on a specified port. The diagnostic is considered successful if the port is open.
C ) HTTPGetAction: Performs an HTTP GET request against the Pod's IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
In your case:
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
I think that in your case (the need to refer to a deployment as succeed / failed and take the proper action) you should:
Step 1:
Setup a HTTP/TCP readiness Probe - for example:
readinessProbe:
httpGet:
path: /health-check
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Where:
initialDelaySeconds — The number of seconds since the container has started before the readiness probe can be initiated.
periodSeconds — How often to perform the readiness probe.
failureThreshold — The number of tries to perform the readiness probe if the probe fails on pod start.
Step 2:
Choose the relevant rolling update strategy and how you should handle cases of failures of new pods (consider reading this thread for examples).
A few references you can follow:
Container probes
Kubernetes Liveness and Readiness Probes
Kubernetes : Configure Liveness and Readiness Probes
Kubernetes and Containers Best Practices - Health Probes
Creating Liveness Probes for your Node.js application in Kubernetes
A Failed Deployment
A deployment (or the rollout process) will be considered as Failed
if it tries to deploy its newest ReplicaSet without ever completing over and over again until the progressDeadlineSeconds interval has exceeded.
Then K8S you update the status with:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
ReplicaFailure True FailedCreate
Read more in here.
There is no Kubernetes concept for a "failed" deployment. Editing a deployment registers your intent that the new ReplicaSet is to be created, and k8s will repeatedly try to make that intent happen. Any errors that are hit along the way will cause the rollout to block, but they will not cause k8s to abort the deployment.
AFAIK, the best you can do (as of 1.9) is to apply a deadline to the Deployment, which will add a Condition that you can detect when a deployment gets stuck; see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment and https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds.
It's possible to overlay your own definitions of failure on top of the statuses that k8s provides, but this is quite difficult to do in a generic way; see this issue for a (long!) discussion on the current status of this: https://github.com/kubernetes/kubernetes/issues/1899
Here's some Python code (using pykube) that I wrote a while ago that implements my own definition of ready; I abort my deploy script if this condition does not obtain after 5 minutes.
def _is_deployment_ready(d, deployment):
if not deployment.ready:
_log.debug('Deployment not completed.')
return False
if deployment.obj["status"]["replicas"] > deployment.replicas:
_log.debug('Old replicas not terminated.')
return False
selector = deployment.obj['spec']['selector']['matchLabels']
pods = Pod.objects(d.api).filter(namespace=d.namespace, selector=selector)
if not pods:
_log.info('No pods found.')
return False
for pod in pods:
_log.info('Is pod %s ready? %s.', pod.name, pod.ready)
if not pod.ready:
_log.debug('Pod status: %s', pod.obj['status'])
return False
_log.info('All pods ready.')
return True
Note the individual pod check, which is required because a deployment seems to be deemed 'ready' when the rollout completes (i.e. all pods are created), not when all of the pods are ready.

How to prevent a pod to be added to a kube-service unitl initialization complete

Sometimes a pod should take some times to "warmup"(like load some data to cache). At that time it should not be exposed.
How to prevent a pod to be added to a kube-service unitl initialization complete?
You should use health checks. More specifically in Kubernetes, you need a ReadinessProbe
ReadinessProbe: indicates whether the container is ready to service requests. If the ReadinessProbe fails, the endpoints controller will remove the pod’s IP address from the endpoints of all services that match the pod. The default state of Readiness before the initial delay is Failure. The state of Readiness for a container when no probe is provided is assumed to be Success.
Also, difference from LivenessProbe:
If you’d like to start sending traffic to a pod only when a probe succeeds, specify a ReadinessProbe. In this case, the ReadinessProbe may be the same as the LivenessProbe, but the existence of the ReadinessProbe in the spec means that the pod will start without receiving any traffic and only start receiving traffic once the probe starts succeeding.
http://kubernetes.io/docs/user-guide/pod-states/