Turn on/off liveness probe using configMap UI on Rancher - kubernetes

I'm looking for a approach where we can turn on/off liveness probe using configMap UI on Rancher, and same time it does not require redeployment of the pod.
In details : Assume the liveness probe is already been config on pod. Next I have a boolean flag and it is set false, so if false, liveness probe will stop probing on pod. At other time, if set boolean flag to true, liveness probe will resume probe. All these need to works without pod redeployment. Below is my draft idea:
There's a configMap UI (configMap listed on Rancher) which hold the boolean flag to turn on/off liveness probe
something like this :
app.liveness.probe.mode=false
Next I want to absorb the above boolean flag into deployment.yaml with the conditional checking as below :
{{ if app.liveness.probe.mode }}
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 70
periodSeconds: 10
{{ end }}
I'm not sure how to reference the configMap boolean flag on Rancher into the deployment.yaml file.
Or is there any other approach to control the liveness probe on/off toggle without redeployment of pod.

Sounds to me as if you want to temporarily switch off you Liveness probe in order to keep your Pod running, even if the Pod is technically not healthy.
Have you considered using a Readiness probe for this?
So, you remove your Liveness probe, which basically keeps your Pod running as long as your application is running.
Your define a Readiness probe that executes HTTP checks on an application endpoint. While that endpoint returns 200, your Pod receives traffic. If you want to take traffic off the Pod, while keeping your Pod running, your readiness probe will have to return an error code (>=400).
To accomplish this dynamically with ConfigMaps, you could mount a ConfigMap as a volume into your application container putting a config file onto your container file system. The config file contains a flag like readninessProbe.active=true. This file is read by your Readiness probe endpoint and returns a 200 or 500 dependent on the value of the boolean.
Naturally, a similar approach could be used to control your Livness probe endpoint, but the K8s mechanism for keeping Pods running but taking traffic off them is taken care of by Readiness probes.

Related

How to set basic auth to Kubernetes' readinessProbe correctly?

How to set basic auth to Kubernetes' readinessProbe correctly?
If set this config for Kubernetes' readinessProbe in deployment kind.
readinessProbe:
httpGet:
path: /healthcheck
port: 8080
httpHeaders:
- name: Authorization
value: Basic <real base64 encoded data>
Deploy it to GKE, GCP's health check can't pass to access the inside application with using basic authentication.
But from here, it seems it should use this syntax. Why can't pass?
The server side is using JSON response at the /healthcheck point. Is it also necessary to set Accept or Content-Type to the httpHeaders?
And, is it good to set this health check to livenessProbe or readinessProbe?
According to kubernetes doc:
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy. If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-liveness-probe
If you'd like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe. In this case, the readiness probe might be the same as the liveness probe, but the existence of the readiness probe in the spec means that the Pod will start without receiving any traffic and only start receiving traffic after the probe starts succeeding. If your container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-readiness-probe
So, you can use health check according to your need. But in kubernetes doc, they give an example of health check as a liveliness probe.
Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request
And it is the best practice to use Content-Type when you send request other then browser or other typical client. I believe using Content-Type: application/json will solve the issue if other things go right in server side.

I am using 8888 for kubernetes health probes and 8887 for normal HTTP requests. So if readiness probe fails, should i still expect traffic on 8887?

I am using 8888 for liveness & readiness probes, 8887 for normal HTTP requests, readiness probe is failing and pods are in 0/1, not ready state. ButI still see normal POST requests being served by the pod. Is this expected. should health probes and normal requests be received on the same port?
Liveness and readyness probes have different purposes. In short the liveness probe controls whether Kubernetes will restart the pod. But the readyness probe controls whether a pod is included in the endpoints of a service. Unless a pod has indicated it's ready through the readyness probe, it should not receive traffic through a service. That doesn't mean it can't be sent requests, it just means it won't be sent traffic through the service. So in your case the question is, where are those POST requests coming from.
#pst and #Harsh are right but I would like to expand on it a bit.
As the official docs say:
If you'd like to start sending traffic to a Pod only when a probe
succeeds, specify a readiness probe. In this case, the readiness probe
might be the same as the liveness probe, but the existence of the
readiness probe in the spec means that the Pod will start without
receiving any traffic and only start receiving traffic after the
probe starts succeeding.
and:
The kubelet uses readiness probes to know when a container is ready to
start accepting traffic. A Pod is considered ready when all of its
containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
Answering your question:
So if readiness probe fails, should i still expect traffic on 8887?
No, the pod should not start receiving traffic if the readiness probe fails.
It can also depend on your app. By using a readiness probe, Kubernetes waits until the app is fully started before it allows the service to send traffic to the new copy.
Also, it is very important to make sure your probes are configured properly:
Probes have a number of fields that you can use to more precisely
control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to
0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1
for liveness. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness
probe means restarting the container. In case of readiness probe the
Pod will be marked Unready. Defaults to 3. Minimum value is 1.
If you wish to expand your knowledge regarding the liveness, readiness and startup probes please refer to the official docs. You will wind some examples there that can be compared with your setup in order to see if you understand and configured it right.

k8s - livenessProbe vs readinessProbe

Consider a pod which has a healthcheck setup via a http endpoint /health at port 80 and it takes almost 60 seconds to be actually ready & serve the traffic.
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 60
livenessProbe:
httpGet:
path: /health
port: 80
Questions:
Is my above config correct for the given requirement?
Does liveness probe start working only after the pod becomes ready ? In other words, I assume readiness probe job is complete once the POD is ready. After that livenessProbe takes care of health check. In this case, I can ignore the initialDelaySeconds for livenessProbe. If they are independent, what is the point of doing livenessProbe check when the pod itself is not ready! ?
Check this documentation. What do they mean by
If you want your Container to be able to take itself down for
maintenance, you can specify a readiness probe that checks an endpoint
specific to readiness that is different from the liveness probe.
I was assuming, the running pod will take itself down only if the livenessProbe fails. not the readinessProbe. The doc says other way.
Clarify!
I'm starting from the second problem to answer. The second question is:
Does liveness probe start working only after the pod becomes ready?
In other words, I assume readiness probe job is complete once the POD
is ready. After that livenessProbe takes care of health check.
Our initial understanding is that liveness probe will start to check after readiness probe was succeeded but it turn out not to be like that. It has opened an issue for this challenge.Yon can look up to here. Then It was solved this problem by adding startup probes.
To sum up:
livenessProbe
livenessProbe: Indicates whether the Container is running. If the
liveness probe fails, the kubelet kills the Container, and the
Container is subjected to its restart policy. If a Container does not
provide a liveness probe, the default state is Success.
readinessProbe
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success
look up here.
The liveness probes are to check if the container is started and alive. If this isn’t the case, kubernetes will eventually restart the container.
The readiness probes in turn also check dependencies like database connections or other services your container is depending on to fulfill it’s work. As a developer you have to invest here more time into the implementation than just for the liveness probes. You have to expose an endpoint which is also checking the mentioned dependencies when queried.
Your current configuration uses a health endpoint which are usually used by liveness probes. It probably doesn’t check if your services is really ready to take traffic.
Kubernetes relies on the readiness probes. During a rolling update, it will keep the old container up and running until the new service declares that it is ready to take traffic. Therefore the readiness probes have to be implemented correctly.
I will show the difference between them in a couple of simple points:
livenessProbe
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container has started and is alive or not i.e. proof of being available.
In the given example, if the request fails, it will restart the container.
If not provided the default state is Success.
readinessProbe
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container is ready to serve traffic or not i.e.proof of being ready to use.
It checks dependencies like database connections or other services your container is depending on to fulfill its work.
In the given example, until the request returns Success, it won't serve any traffic(by removing the Pod’s IP address from the endpoints of all Services that match the Pod).
Kubernetes relies on the readiness probes during rolling updates, it keeps the old container up and running until the new service declares that it is ready to take traffic.
If not provided the default state is Success.
Summary
Liveness Probes: Used to check if the container is available and alive.
Readiness Probes: Used to check if the application is ready to be used and serve the traffic.
Both readiness probe and liveness probe seem to have same behavior. They do same type of checks. But the action they take in case of failures is different.
Readiness Probe shuts the traffic from service down. so that service can always the send the request to healthy pod whereas the liveness probe restarts the pod in case of failure. It does not do anything for the service. Service continues to send the request to the pods as usual if it is in ‘available’ status.
It is recommended to use both probes!!
Check here for detailed explanation with code samples.
The Kubernetes platform has capabilities for validating container applications, called healthchecks. Liveness is proof of availability and readness is proof of pod readiness is ready to use.
The features are designed to prevent service downtime and inconsistent images by enabling restarts when needed. Kubernetes uses liveness to know when to restart the container, so it can solve most problems. Kubernetes uses readness to know when the container is available to accept requests. The pod is considered ready when all containers are ready. Therefore, when the pod takes too long to initialize (by cache mount, DB schema, etc.) it is recommended to increase initialDelaySeconds.
I'd post it as a comment but it's too long, So let's make it a full answer.
Is my above config correct for the given requirement?
IMHO no, you are missing initialDelaySeconds for both probes and liveness and rediness probably should not call the same endpoint. I'd use the suggestionss form #fgul
Does liveness probe start working only after the pod becomes ready ?
In other words, I assume readiness probe job is complete once the POD
is ready. After that livenessProbe takes care of health check. In this
case, I can ignore the initialDelaySeconds for livenessProbe. If they
are independent, what is the point of doing livenessProbe check when
the pod itself is not ready! ?
I think you were thinking about startupProbe, again #fgul described what does what so there is no point in me repeating.
I was assuming, the running pod will take itself down only if the
livenessProbe fails. not the readinessProbe. The doc says other way.
The pod can be restarted only based on livenessProbe, not the redinessProbe.
I'd think twice before binding a rediness probe with external services (being alive as #randy advised), especially in high load services:
Let's assume you have define a deployment with lots of pods, that are connecting to a database and are processing lots of requests.
Now the database goes down.
The rediness probe is checking also db connection and it marks all of the pods as "out of service".
Now the db goes up.
Pods rediness probe will start to pass but not instantly and on all pods right away - the pods will be marked as "Ready" one after an other.
But it might be too slow - the second the first pod will be marked as ready, ALL of the traffic will be sent to this one pod alone. It might end in a situation that the "waking up" pods will be killed by the traffic one after an other.
For that kind of situation I'd say the rediness pod should check only pod internal stuff and don't care about the externall services. The kubernetes endpoint will return an error and either the clients might support failing service (it's called "designed for failure") or the loadbalancer/ingress can cover it.
I think the below image describes the use-cases for each.
Liveness probes are a relatively specialized tool, and you probably don't want one at all. However they run totally independently AFAIK.

Kubernetes livenessProbe: restarting vs destroying of the pod

Is there a way to tell Kubernetes to just destroy a pod and create a new one if the liveness probe fails? What I see from logs now: my node js application is just restarted and runs in the same pod.
The liveness probe is defined in my YAML specification as follows:
livenessProbe:
httpGet:
path: /app/check/status
port: 3000
httpHeaders:
- name: Accept
value: application/x-www-form-urlencoded
initialDelaySeconds: 60
periodSeconds: 60
Disclaimer:
I am fully aware that recreating a pod if a liveness prove fails is probably not the best idea and a right way would be to get a notification that something is going on.
So liveness and readiness probes are defined in containers not pods so if you have 1 container in your pod and you specify restartPolicy to Never. Then your pod will go into a Failed state and will be scrapped at some point based on the terminated-pod-gc-threshold value.
If you have more than one container in your pod it becomes tricker because of your other container(s) running making the pod still be in Running status. You can build your own automation or try Pod Readiness which is still in alpha as of this writing.

How to determine a failed kubernetes deployment?

I create a Pod with Replica count of say 2, which runs an application ( a simple web-server), basically it's always running command - However due to mis-configuration, sometimes the command exits and the pod is then Terminated.
Due to default restartPolicy of Always the pod (and hence the container) is restarted and eventually the Pod status is CrashLoopBackOff.
If I do a kubectl describe deployment, it shows Condition as Progressing=True and Available=False.
This looks fine - the question is - how do I mark my deployment as 'failed' in the above case?
Adding a spec.ProgressDeadlineSeconds doesn't seem to be having an effect.
Will simply saying restartPolicy as Never be enough in the Pod specification?
A related question, is there a way of getting this information as a trigger/webhook, without doing a rollout status watch?
A bit of theory
Regarding your question:
How do I mark my deployment as 'failed' in the above case?
Kubernetes gives you two types of health checks:
1 ) Readiness
Readiness probes are designed to let Kubernetes know when your app is ready to serve traffic.
Kubernetes makes sure the readiness probe passes before allowing a service to send traffic to the pod.
If a readiness probe starts to fail, Kubernetes stops sending traffic to the pod until it passes.
2 ) Liveness
Liveness probes let Kubernetes know if your app is alive or dead.
If you app is alive, then Kubernetes leaves it alone. If your app is dead, Kubernetes removes the Pod and starts a new one to replace it.
At the moment (v1.19.0) , Kubernetes has support for 3 types mechanisms for implementing liveness and readiness probes:
A ) ExecAction: Executes a specified command inside the container. The diagnostic is considered successful if the command exits with a status code of 0.
B ) TCPSocketAction: Performs a TCP check against the Pod's IP address on a specified port. The diagnostic is considered successful if the port is open.
C ) HTTPGetAction: Performs an HTTP GET request against the Pod's IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
In your case:
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
I think that in your case (the need to refer to a deployment as succeed / failed and take the proper action) you should:
Step 1:
Setup a HTTP/TCP readiness Probe - for example:
readinessProbe:
httpGet:
path: /health-check
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
Where:
initialDelaySeconds — The number of seconds since the container has started before the readiness probe can be initiated.
periodSeconds — How often to perform the readiness probe.
failureThreshold — The number of tries to perform the readiness probe if the probe fails on pod start.
Step 2:
Choose the relevant rolling update strategy and how you should handle cases of failures of new pods (consider reading this thread for examples).
A few references you can follow:
Container probes
Kubernetes Liveness and Readiness Probes
Kubernetes : Configure Liveness and Readiness Probes
Kubernetes and Containers Best Practices - Health Probes
Creating Liveness Probes for your Node.js application in Kubernetes
A Failed Deployment
A deployment (or the rollout process) will be considered as Failed
if it tries to deploy its newest ReplicaSet without ever completing over and over again until the progressDeadlineSeconds interval has exceeded.
Then K8S you update the status with:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
ReplicaFailure True FailedCreate
Read more in here.
There is no Kubernetes concept for a "failed" deployment. Editing a deployment registers your intent that the new ReplicaSet is to be created, and k8s will repeatedly try to make that intent happen. Any errors that are hit along the way will cause the rollout to block, but they will not cause k8s to abort the deployment.
AFAIK, the best you can do (as of 1.9) is to apply a deadline to the Deployment, which will add a Condition that you can detect when a deployment gets stuck; see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment and https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds.
It's possible to overlay your own definitions of failure on top of the statuses that k8s provides, but this is quite difficult to do in a generic way; see this issue for a (long!) discussion on the current status of this: https://github.com/kubernetes/kubernetes/issues/1899
Here's some Python code (using pykube) that I wrote a while ago that implements my own definition of ready; I abort my deploy script if this condition does not obtain after 5 minutes.
def _is_deployment_ready(d, deployment):
if not deployment.ready:
_log.debug('Deployment not completed.')
return False
if deployment.obj["status"]["replicas"] > deployment.replicas:
_log.debug('Old replicas not terminated.')
return False
selector = deployment.obj['spec']['selector']['matchLabels']
pods = Pod.objects(d.api).filter(namespace=d.namespace, selector=selector)
if not pods:
_log.info('No pods found.')
return False
for pod in pods:
_log.info('Is pod %s ready? %s.', pod.name, pod.ready)
if not pod.ready:
_log.debug('Pod status: %s', pod.obj['status'])
return False
_log.info('All pods ready.')
return True
Note the individual pod check, which is required because a deployment seems to be deemed 'ready' when the rollout completes (i.e. all pods are created), not when all of the pods are ready.