Does Kubernetes Liveness and Readiness check if python is running? - kubernetes

I've got a simple question but I can't find out the right answer.
I have a couple of pods runnig my applications in python in kubernetes. I didn't implement liveness and readiness yet. When I was talking to my leader, he told me that I had to create the liveness and readiness for check and restart my pods when necessary and I had to find a way to make the liveness and readiness check if the python is also running because it could get stuck and the container could show that's everything fine.
I'v got confused because for my liveness and readiness would do this. I must create it using command, as this microservices doesn't have endpoint or healthcheck as they are just workers.
Any clue for how can I do that? Or a good answer that can explain that liveness and readiness check whether the python is running or not.
Thanks a lot!

Readiness won't restart your pod, it'll just make your worker not reachable through a load balancer/service, Liveness will do restart if the condition failed.
You don't need to have liveness run through an endpoint, you can make sure that it's just reachable:
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 20
successThreshold: 1
tcpSocket:
port: <port-number>
timeoutSeconds: 5
You can expose a port on the running python worker and just make sure that it's reachable, otherwise, think logically about when you do want to restart the pod? what do you mean by it could get stuck

Related

Why do I need 3 different kind of probes in kubernetes: startupProbe, readinessProbe, livenessProbe

Why do I need 3 different kind of probes in kubernetes:
startupProbe
readinessProbe
livenessProbe
There are some questions (k8s - livenessProbe vs readinessProbe, Setting up a readiness, liveness or startup probe) and articles about this topic. But this is not so clear:
Why do I need 3 different kind of probes?
What are the use cases?
What are the best practises?
These 3 kind of probes have 3 different use cases. That's why we need 3 kind of probes.
Liveness Probe
If the Liveness Probe fails, the pod will be restarted (read more about failureThreshold).
Use case: Restart pod, if the pod is dead.
Best practices: Only include basic checks in the liveness probe. Never include checks on connections to other services (e.g. database). The check shouldn't take too long to complete.
Always specify a light Liveness Probe to make sure that the pod will be restarted, if the pod is really dead.
Startup Probe
Startup Probes check, when the pod is available after startup.
Use case: Send traffic to the pod, as soon as the pod is available after startup. Startup probes might take longer to complete, because they are only called on initializing. They might call a warmup task (but also consider init containers for initialization). After the Startup probe succeeds, the liveliness probe is called.
Best practices: Specify a Startup Probe, if the pod takes a long time to start. The Startup and Liveness Probe can use the same endpoint, but the Startup Probe can have a less strict failure threshhold which prevents a failure on startup (s. Kubernetes in Action).
Readiness Probe
In contrast to Startup Probes Readiness Probes check, if the pod is available during the complete lifecycle.
In contrast to Liveness Probes only the traffic to the pod is stopped, if the Readiness probe fails, but there will be no restart.
Use case: Stop sending traffic to the pod, if the pod can not temporarily serve because a connection to another service (e.g. database) fails and the pod will recover later.
Best practices: Include all necessary checks including connections to vital services. Nevertheless the check shouldn't take too long to complete.
Always specify a Readiness Probe to make sure that the pod only gets traffic, if the pod can properly handle incoming requests.
Documentation
This article explains very well the differences between the 3 kind of probes.
The Official kubernetes documentation gives a good overview about all configuration options.
Best practises for probes.
The book Kubernetes in Action gives most detailed insights about the best practises.
The difference between livenessProbe, readinessProbe, and startupProbe
livenessProbe:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container has started and is alive or not i.e. proof of being avaliable.
In the given example, if the request fails, it will restart the container.
If not provided the default state is Success.
readinessProbe:
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container is ready to serve traffic or not i.e.proof of being ready to use.
It checks dependencies like database connections or other services your container is depending on to fulfill its work.
In the given example, until the request returns Success, it won't serve any traffic(by removing the Pod’s IP address from the endpoints of all Services that match the Pod).
Kubernetes relies on the readiness probes during rolling updates, it keeps the old container up and running until the new service declares that it is ready to take traffic.
If not provided the default state is Success.
startupProbe:
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the application inside the Container has started.
If a startup probe is provided, all other probes are disabled.
In the given example, if the request fails, it will restart the container.
Once the startup probe has succeeded once, the liveness probe takes over to provide a fast response to container deadlocks.
If not provided the default state is Success.
Check K8S documenation for more.
I think the below table describes the use-cases for each.
Feature
Readiness Probe
Liveness Probe
Startup Probe
Exmine
Indicates whether the container is ready to service requests.
Indicates whether the container is running.
Indicates whether the application within the container is started.
On Failure
If the readiness probe fails, the endpoints controller removes the pod's IP address from the endpoints of all services that match the pod.
If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.
If the startup probe fails, the kubelet kills the container, and the container is subjected to its restart policy.
Default Case
The default state of readiness before the initial delay is Failure. If a container does not provide a readiness probe, the default state is Success.
If a container does not provide a liveness probe, the default state is Success.
If a container does not provide a startup probe, the default state is Success.
Sources:
Kubernetes in Action
Here's a concrete example of one we're using in our app. It has a single crude HTTP healthcheck, accessible on http://hostname:8080/management/health.
ports:
- containerPort: 8080
name: http-traffic
App Initialization (startup)
Spring app that is slow to start - anywhere between 30-120 seconds.
Don't want other probes to run until app is started.
Check it every 10 seconds for up to 180s before k8s gets into a crash loop.
startupProbe:
successThreshold: 1
failureThreshold: 18
periodSeconds: 10
timeoutSeconds: 5
httpGet:
path: /management/health
port: web-traffic
Healthcheck (readiness)
Ping the app every 10 seconds to make sure it's healthy (ie. accepting HTTP requests).
If fail two subsequent pings, cordone it off (prevents cascades).
Must pass two subsequent health checks before can accept traffic again.
readinessProbe:
successThreshold: 2
failureThreshold: 2
periodSeconds: 10
timeoutSeconds: 5
httpGet:
path: /management/health
port: web-traffic
App has Died (liveliness)
If app fails 3 consecutive health checks, 30 seconds apart, reboot the container. Maybe app got into an unrecoverable state like Java ran out of heap memory.
livenessProbe:
successThreshold: 1
failureThreshold: 3
periodSeconds: 30
timeoutSeconds: 5
httpGet:
path: /management/health
port: web-traffic

When a container encounters a device error, what is the best way for it to tell kubernetes?

Given a running container that has been given one to many SRIOV devices, as assigned by the scheduler on the cluster master during launch, if the container app using the device(s) encounters, say, a device timeout, how should it report the error to kubernetes?
This is almost like an HA event sort of thing... So maybe there's a best way to do this from an application perspective?
Kubernetes Liveness and Readiness Probes can be used to do this:
livenessProbe:
exec:
command:
- <command or HTTP GET to check SRIOV device timeout>
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
exec:
command:
- <command or HTTP GET to check SRIOV device timeout>
initialDelaySeconds: 5
periodSeconds: 5
Here are more links to check pod health:
Container probes
HTTP probes
Probe
The question is a bit ambiguous as it is not clear what "report to Kubernetes" implies exactly.
If your main concern is to manifest the information about the error inside Kubernetes, you could generate a custom Kubernetes event, an approach e. g. implemented by Xing in their oom-event-generator. This would be an approach to trigger custom logic inside a custom operator watching these events.
If you want native Kubernetes to act upon this information, the liveness and readiness checks are what you are looking for. The liveness fail tells Kubernetes to restart the container according to the POD's restart policy, while the readiness fail tells Kubernetes not to route any traffic through load balancers (services) to the container.

How can we restart the Kubernetes pod if its readiness fail

a quick question. I know if the Kubernetes liveness probe fails, kubernetes will help restart the pod and try again. But how about the readiness probe fails? How can I also ask kubernetes to restart the pod?
api-group-0 0/1 Running 0 6h35m
Restart this pod can make it works. Thanks all!
There's no way to trigger pod restart within a readiness probe.
As it was recommended in the comments, you should rely on liveness probe instead.
livenessProbe:
exec:
command:
- /opt/fissile/readiness-probe.sh
initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 3
If you have concerns about readiness-probe.sh fails periodically and shouldn't trigger restart straight after the first failure, consider failureThreshold setting. It will give this many tries before pod restart.

Kubernetes: readinessProbes failing but the livelinessProbe is succeeding with the same settings

I have a livelinessProbe configured for my pod which does a http-get on path on the same pod and a particular port. It works perfectly. But, if I use the same settings and configure a readinessProbe it fails with the below error.
Readiness probe failed: wsarecv: read tcp :50578->:80: An existing connection was forcibly closed by the remote host
Actually after certain point I even see the liveliness probes failing. not sure why . Liveliness probe succeeding should indicate that the kube-dns is working fine and we're able to reach the pod from the node. Here's the readinessProbe for my pod's spec
readinessProbe:
httpGet:
path: /<path> # -> this works for livelinessProbe
port: 80
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 10
Does anyone have an idea what might be going on here.
I don't think it has anything to do with kube-dns or coredns. The most likely cause here is that your pod/container/application is crashing or stop serving requests.
Seems like this timeline:
Pod/container comes up.
Liveliness probe passes ok.
Some time passes.
Probably app crash or error.
Readiness fails.
Liveliness probe fails too.
More information about what that error means here:
An existing connection was forcibly closed by the remote host

Kubernetes livenessProbe: restarting vs destroying of the pod

Is there a way to tell Kubernetes to just destroy a pod and create a new one if the liveness probe fails? What I see from logs now: my node js application is just restarted and runs in the same pod.
The liveness probe is defined in my YAML specification as follows:
livenessProbe:
httpGet:
path: /app/check/status
port: 3000
httpHeaders:
- name: Accept
value: application/x-www-form-urlencoded
initialDelaySeconds: 60
periodSeconds: 60
Disclaimer:
I am fully aware that recreating a pod if a liveness prove fails is probably not the best idea and a right way would be to get a notification that something is going on.
So liveness and readiness probes are defined in containers not pods so if you have 1 container in your pod and you specify restartPolicy to Never. Then your pod will go into a Failed state and will be scrapped at some point based on the terminated-pod-gc-threshold value.
If you have more than one container in your pod it becomes tricker because of your other container(s) running making the pod still be in Running status. You can build your own automation or try Pod Readiness which is still in alpha as of this writing.