I want to understand what happens behind the scene if a liveness probe fails in kubernetes ?
Here is the context:
We are using Helm Chart for deploying out application in Kubernetes cluster.
We have a statefulsets and headless service. To initialize mTLS, we have created a 'job' kind and in 'command' we are passing shell & python scripts are an arguments.
We have written a 'docker-entrypoint.sh' inside 'docker image' for some initialization work.
Inside statefulSet, we are passing a shell script as a command in 'livenessProbe' which runs every 30 seconds.
I want to know if my livenessProbe fails for any reason :
1. Does helm chart monitor this probe & will restart container or it's K8s responsibility ?
2. Will my 'docker-entryPoint.sh' execute if container is restarted ?
3. Will 'Job' execute at the time container restart ?
How Kubernetes handles livenessProbe failure and what steps it takes?
It's not helm's responsibility.It's kubernetes's responsibility to restart the pod in case of readiness probe failure.
Yes docker-entryPoint.sh is executed at container startup.
Job needs to be applied again to the cluster for it to execute. Alternatively you could use initcontainer which is guaranteed to run before the main container starts.
Kubelet kills the container and restarts it if liveness probe fails.
To answer your question liveness probe and readiness probe are actions basically get calls to your application pod to check whether it is healthy.
This is not related to helm charts.
Once the liveness or readiness probe fails container restart takes place .
I would say these liveness probes failure can affect your app uptime, so use a rolling deployment and autoscale your pod counts to enable availability.
Related
We have Hazelcast 4.2 on OpenShift deployed as a standalone cluster and stateful set. We use Mongo as a backing data store (it shouldn't matter) and the docker image is created with a copy of dockerfile from Github Hazelcast project with all the package repositories replaced with our internal company servers.
We have a MapLoader which takes a long time (30 minutes) to load all the data. During this load time the cluster fails to respond to liveness and readiness probes:
Readiness probe failed: Get http://xxxx:5701/hazelcast/health/node-state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
and so OpenShift kills the nodes that are loading the data.
Is there any way to fix it, so Hazelcast responds with "alive" and "not ready" instead of timing out the connection?
You can increase the time for POD to check the health of POD if you are loading data initially.
https://github.com/hazelcast/charts/blob/d3b8d118da400255effc81e67a13c1863fee5f41/stable/hazelcast/values.yaml#L120
Above is helm example line showing the readiness and liveness configuration.
You can change the initialDelaySeconds to wait and after that seconds only Hazelcast will start checking the health.
Accordingly you can also adjust the other probes configuration like failureThreshold: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
if initialDelaySeconds is not helpful, still you can increase the configuration for the readiness & liveness accordingly and increase the timeout whenever map loader loading the data.
Also only in liveness your POD or container get restart if Readiness failing K8s will mark your POD as not ready to accept traffic. So while loading data you can increase the liveness configuration so it never failed and set to max possible.
The kubelet uses liveness probes to know when to restart a container.
For example, liveness probes could catch a deadlock, where an
application is running, but unable to make progress. Restarting a
container in such a state can help to make the application more
available despite bugs.
The kubelet uses readiness probes to know when a container is ready to
start accepting traffic. A Pod is considered ready when all of its
containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
You can also set the restartPolicy to Never so the container will not restart.
Liveness probes are supposed to trigger a restart of failed containers. Do they respect the default stateful set deployment and scaling guarantees. E.g. if the liveness probe fails at the same time for multiple pods within one and the same stateful set, would K8S attempt to restart one container at a time or all in parallel?
According to https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ the liveness probes are a feature implemented in the kubelet:
The kubelet uses liveness probes to know when to restart a container.
This means any decision about scheduling that requires knowledge of multiple pods is not taken into account.
Therefore if all your statefulset's pods have failing liveness probes at the same time they will be rescheduled at about the same time not respecting any deployment-level guarantees.
I am very new to k8s so apologies if the question doesn't make sense or is incorrect/stupid.
I have a liveness probe configured for my pod definition which just hits a health API and checks it's response status to test for the liveness of the pod.
My question is, while I understand the purpose of the liveness/readiness probes...what exactly are they? Are they just another type of pods which are spun up to try and communicate with our pod via the configured API? Or are they some kind of a lightweight process which runs inside the pod itself and attempts the API call?
Also, how does a probe communicate with a pod? Do we require a service to be configured for the pod so that the probe is able to access the API or is it an internal process with no additional config required?
Short answer: kubelet handle this checks to ensure your service is running, and if not it will be replaced by another container. Kubelet runs in every node of your cluster, you don't need to make any addional configurations.
You don't need to configure a service account to have the probes working, it is a internal process handled by kubernetes.
From Kubernetes documentation:
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.
For network probes, they are run from the kubelet on the node where the pod is running. Exec probes are run via the same mechanism as kubectl exec.
Is there a way to make a kubernetes cluster to deploy first the statefulset and then all other deployments?
I'm working in GKE and I have a Redis pod which I want to get up and ready first because the other deployments depend on the connection to it.
You can use initcontainer in other deployments.Because init containers run to completion before any app containers start, init containers offer a mechanism to block or delay app container startup until a set of preconditions are met.
The init container can have a script which perform a readiness probe of the redis pods.
I had read on the documentation that liveness probes make a new pod and stop the other one. But in the kubernetes dashboard it shows me only restarts with my tcp livness probe. I was wondering what kubernetes does during a liveness probe. Can i control it?
The kubelet uses liveness probes to know when to restart a Container, not recreate the pods.
Probes have a number of fields that you can use to more precisely control the behavior of the checks (initialDelaySeconds,periodSeconds, timeoutSeconds, successThreshold and failureThreshold). You can find details about them here.
For container restart, SIGTERM is first sent with waits for a parameterized grace period, and then Kubernetes sends SIGKILL. You can control some of this behavior by tweaking the terminationGracePeriodSeconds value and/or Attaching Handlers to Container Lifecycle Events.