Multiple liveness probes in kuberenetes - kubernetes

I have a program which has multiple independent1 components.
It is trivial to add a liveness probe in all of the components, however it's not easy to have a single liveness probe which would determine the health of all of the program's components.
How can I make kubernetes look at multiple liveness probes and restart the container when any of those are defunct?
I know it can be achieved by adding more software, for example an additional bash script which does the liveness checks, but I am looking for a native way to do this.
1By independent I mean that failure of one component does not make the other components fail.

The Kubernetes API allows one liveness and one readness per application (Deployment / POD). I recommend creating a validations centralizing service that has an endpoint rest:
livenessProbe:
httpGet:
path: /monitoring/alive
port: 3401
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 15
timeoutSeconds: 1
periodSeconds: 15
or try one bash to same task, like:
livenessProbe:
exec:
command:
- bin/bash
- -c
- ./liveness.sh
initialDelaySeconds: 220
timeoutSeconds: 5
liveness.sh
#!/bin/sh
if [ $(ps -ef | grep java | wc -l) -ge 1 ]; then
echo 0
else
echo "Nothing happens!" 1>&2
exit 1
fi
Recalling what the handling of messages can see in the events the failure in question:
"Warning Unhealthy Pod Liveness probe failed: Nothing happens!"
Hope this helps

It doesn't do that. The model is pretty simple, one probe per container, follow the restart policy on failure.
Understood about the designed-for-containers problem with legacy apps, but there really are a lot of ways to arrange for resources to be shared for legacy compatibility. If the components of this system are already different processes, then there should be a way to partition them into containers.
If the components are threads or some other intra-application modularization technique, then the liveness determination really has to come from inside the app.

Related

How to define k8s liveness probe and readness probe for worker pod

I have a k8s cluster. Our service is queue based. Our pod subscribe to an event queue,fetch event and do tasks. So for this kind of service, how to define k8s liveness probe and readiness probe?
Following is a very brief introduction to these probes:
Liveliness Probe is for the Kubernetes to know if a workload is healthy. It could be a shell command executed in your container or a simple tcp/http request which should respond positively.
If a liveliness check fails after a period of timeout which is specified in the pod config, Kubrenetes will restart the workload.
So, if your workload is doing time consuming processes, you might need to give your liveliness probe enough time to make sure that your pod is not restarted unduely.
Rediness Probe is for the Kubernetes proxy to decide if your workload is ready for consuming traffic. The traffic will be sent to your pod only if the rediness probe responds positively. So, if your workload needs more time processing a single request and needs other requests to be diverted to other replicas for fast processing during this time, you might want to give a slightly high rediness interval to the workloads.
These probe parameters, combined with number of replicas can ensure fast and healthy functioning of your application. It is very important to understand the area each probe cover and the parameters you can tune them by.
Here are some reads:
https://blog.colinbreck.com/kubernetes-liveness-and-readiness-probes-how-to-avoid-shooting-yourself-in-the-foot/
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Assuming your problem is because is a processing worker consuming queue messages it doesn't expose any port to check.
In that case, you can define livenessProbe and readinessProbe custom command, next is an example from the documnetation:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Also, take in mind the time your process takes to be live and ready to adjust the initialDelaySeconds and periodSeconds to not kill the pod before it is fully loeaded.

Kubernetes external probes

Is it possible to define external path for example a other webserver as path for the web probes?
Or a TCP probe with a different IP?
livenessProbe:
httpGet:
path: external.de/test
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
I know thats not how you should use probes but I need it for testing.
Does someone know how to define probes that are not applied on the pods directly?
You can use following command along with your liveness probe
livenessProbe:
exec:
command:
- curl
- external.de/test:8080
initialDelaySeconds: 10
periodSeconds: 105
In this case, if curl external.de/test:8080command returns with an exit code of 0 then it will be assumed healthy, otherwise any other exit code will be deemed unhealthy.
Also keep in mind, once probe will fail, the pod running this probe will be restarted, not the one that running external.de/test:8080 web server
More details on how to use command within liveness probe described here
If you want to achieve that, you cannot use the http probe.
You have to use the exec one, pointing to a simple bash script that executes cURL on your behalf, so you can mount it via a ConfigMap or directly hostMount to perform your testing.

exec probe in GKE

I'm trying to use exec probes for readiness and liveness in GKE. This is because it is part of Kubernetes' recommended way to do health checks on gRPC back ends. However when I put the exec probe config into my deployment yaml and apply it, it doesn't take effect in GCP. This is my container yaml:
- name: rev79-uac-sandbox
image: gcr.io/rev79-232812/uac:latest
imagePullPolicy: Always
ports:
- containerPort: 3011
readinessProbe:
exec:
command: ["bin/grpc_health_probe", "-addr=:3011"]
initialDelaySeconds: 5
livenessProbe:
exec:
command: ["bin/grpc_health_probe", "-addr=:3011"]
initialDelaySeconds: 10
But still the health checks fail and when I look at the health check configuration in the GCP console I see a plain HTTP health check directed at '/'
When I edit a health check in GCP console there doesn't seem to be any way to choose an exec type. Also I can't see any mention of liveness checks as contrasted to readiness checks even though these are separate Kubernetes things.
Does Google cloud support using exec for health checks?
If so, how do I do it?
If not, how can I health check a gRPC server?
TCP probes are useful when we are using gRPC Services rather than using HTTP probes.
- containerPort: 3011
readinessProbe:
tcpSocket:
port: 3011
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 3011
initialDelaySeconds: 15
periodSeconds: 20
the kubelet will attempt to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy, if it can’t it is considered a failure
define-a-tcp-liveness-probe
Exec probes work in GKE just the same way they work everywhere. You can view liveness probe result in "kubectl describe pod". Or you can simply log in into pod, execute command and see its return code.
The server has to implement the grpc probe protocol as indicated here as indicated in this article
Both answers from Vasily Angapov and Suresh Vishnoi should in theory work, however in practice they don't (at least in my practice).
So my solution was to start another server on my backend container - an HTTP server that simply has the job of executing the health check whenever it gets a request and returning a 200 status if it passes and a 503 if it fails.
I also had to open a second port on my container for that server to listen on.

Openshift readiness probe not executed

Running a Spring Boot application inside a OpenShift Pod. To execute the readiness and liveness probe, I created an appropriate YAML file. However the Pod fails and responds that he was not able to pass the readiness check (after approximately 5 minutes).
My goal is to execute the readiness probe every 20 minutes. But I assume that it is failing because it adds up the initalDelaySeconds together with the periodSeconds. So I guess that the first check after the pod has been started will be executed after 22 minutes.
Following the related configuration of the readiness probe.
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 1200
successThreshold: 1
timeoutSeconds: 60
Is my assumption right? How to avoid it (Maybe increase the timeout regarding the kubelet)?
Your configuration is correct and the initialDelaySeconds and periodSeconds do not sum up. So, the first readinessProbe HTTP call will exactly in 2 min after you start your POD.
I would look for an issue in your app itself, first thing that comes to my mind is that your path is /actuator/health, shouldn't it be just /health? That is the default in case of Spring Boot Actuator.
If that doesn't help, then the best would be to debug it: exec into your container and use curl to check if your health endpoint works correctly (it should return HTTP Code 200).

Helm chart variable definitions

I am creating an helm chart that should install 2 services.
It has a dependency that first postgresql service will be installed.
Then the other service should use the database user,password,hostname and port for the postgresql service installed.
Since I need to get these details run time I.e soon installed postgresql service of course user details I will use as env variables, hostname and port to be used once postgresql is deployed.
I tried using some template functions and subchart concepts that I got from different sites.. but nothing is solving the requirement.
Is there any examples that I can get to match the above requirement ?
There are a couple of ways you could do this, for ex. using a InitContainer to check if DB is up, but I will show you with a sample example in the charts. I am using Wordpress Chart as an example
livenessProbe:
httpGet:
path: /wp-login.php
port: http
initialDelaySeconds: 120
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
httpGet:
path: /wp-login.php
port: http
initialDelaySeconds: 30
timeoutSeconds: 3
periodSeconds: 5
I have removed some lines for brevity.
The readiness probe will start acting after a initialDelaySeconds of 30 seconds, will check every periodSeconds i.e. 5 seconds to see if the page responds. Unless the readiness probe succeeds, the traffic won't be sent to this pod. If the probe succeeds then we are good.
The second check - liveness probe does something more. It is starting 120 seconds after the pod is deployed. But if the check fails, it will restart the pod and it will restart failureThreshold times i.e. 6 times.
Coming back to your question and how to solve this:
Use liveness and readiness probes in the applications which are dependent on the database
Use some defaults based on your experience and optimize them as you go.
More information about the readiness and liveness probes can be found here