How to avoid starting kubernetes pod liveness checks until all containers are running - kubernetes

I'm having trouble with health checks starting too early on a kubernetes pod with multiple containers. My pod is set up like this:
main-container (nodejs)
sidecar container (http proxy)
Currently the health checks are configured on the sidecar container, and end up hitting both containers (proxy, then main container).
If the main container starts quickly, then everything is fine. But if the sidecar starts quickly and the main container starts slowly (e.g. if the image needs to be pulled) then the initial liveness checks start on the sidecar before the other container has even started.
Is there a way of telling kubernetes: don't start running any probes (liveness or readiness checks) until all the containers in the pod have started?
I know I can use a startupProbe to be more generous waiting for startup: but ideally and to avoid other monitoring warnings, I'd prefer to suppress the health/liveness probes completely until all the containers have started.

Answering your question - yes, there is a way of doing so using startupProbe on your sidecar container pointing to your main application's opened port. As per the documentation all other probes (per container) are disabled if a startup probe is provided, until it succeeds. For more information about how to set up a startup probe visit here.

Related

Openshift asynchronous wait for container to be running

I'm creating multiple pods at the same time in Openshift, and I also want to check the containers inside the pods are working correctly.
Some of these containers can take a while to start-up, and I don't want to wait for one pod to be fully running before starting up the other one.
Are there any Openshift / Kubernetes checks I can do to ensure a container has booted up, while also going ahead with other deployments?
Please configure the Liveness and Readiness Probes
Liveness : Under what circumstances is it appropriate to restart the pod?
Readiness : under what circumstances should we take the pod out of the list of service endpoints so that it no longer responds to
requests?
...Some of these containers can take a while to start-up
Liveness probe is not a good option for containers that requires extended startup time, mainly because you have to set a long time to cater for startup; which is irrelevant after that - result to unable to detect problem on time during execution. Instead, you use startup probe to handle and detect problem during startup and handover to liveness probe upon success; or restart container according to its restartPolicy should the startup probe failed.

Reconcile dependent objects k8s controller runtime

I am trying to implement a k8s controller that should watch two custom resources and then reconcile them by creating a deployment, service, etc. The problem is that custom resource A and custom resource B should be created first (of course we don't know the order) and then I should perform reconciling. Inside Reconcile function I can have access to one of them at a time. What is the best approach to make sure both have been created before proceeding to reconcile?
As I mentioned in comments you could try to use either
Container probes
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.
Additional links:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
https://www.magalix.com/blog/kubernetes-and-containers-best-practices-health-probes
Helm
Helm is a tool for managing Charts. Charts are packages of pre-configured Kubernetes resources.
Use Helm to:
Find and use popular software packaged as Helm Charts to run in Kubernetes
Share your own applications as Helm Charts
Create reproducible builds of your Kubernetes applications
Intelligently manage your Kubernetes manifest files
Manage releases of Helm packages
With your situation you could use helm hooks, so it might wait until those resources gets created and then perform creating deployment,service,etc.
Additional links:
https://github.com/helm/helm
https://github.com/jainishshah17/Chart-post-install-hook

Is container where liveness or readiness probes's config are set to a "pod check" container?

I'm following this task Configure Liveness, Readiness and Startup Probes
and it's unclear to me whether a container where the check is made is a container only used to check the availability of a pod? Because it makes sense if pod check container fails therefore api won't let any traffic in to the pod.
So a health check signal must be coming from container where some image or app runs? (sorry, another question)
From the link you provided it seems like they are speaking about Containers and not Pods so the probes are meant to be per containers. When all containers are ready the pod is described as ready too as written in the doc you provided :
The kubelet uses readiness probes to know when a Container is ready to
start accepting traffic. A Pod is considered ready when all of its
Containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
So yes, every containers that are running some images or apps are supposed to expose those metrics.
Livenes and readiness probes as described by Ko2r are additional checks inside your containers and verified by kubelet according to the settings fro particular probe:
If the command (defined by health-check) succeeds, it returns 0, and the kubelet considers the Container to be alive and healthy. If the command returns a non-zero value, the kubelet kills the Container and restarts it.
In addition:
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
Fro another point of view:
Pod is a top-level resource in the Kubernetes REST API.
As per docs:
Pods are ephemeral. They are not designed to run forever, and when a Pod is terminated it cannot be brought back. In general, Pods do not disappear until they are deleted by a user or by a controller.
Information about controllers can find here:
So the best practise is to use controllers like describe above. You’ll rarely create individual Pods directly in Kubernetes–even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a Controller), it is scheduled to run on a Node in your cluster. The Pod remains on that Node until the process is terminated, the pod object is deleted, the Pod is evicted for lack of resources, or the Node fails.
Note:
Restarting a container in a Pod should not be confused with restarting the Pod. The Pod itself does not run, but is an environment the containers run in and persists until it is deleted
Because Pods represent running processes on nodes in the cluster, it is important to allow those processes to gracefully terminate when they are no longer needed (vs being violently killed with a KILL signal and having no chance to clean up). Users should be able to request deletion and know when processes terminate, but also be able to ensure that deletes eventually complete. When a user requests deletion of a Pod, the system records the intended grace period before the Pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container. Once the grace period has expired, the KILL signal is sent to those processes, and the Pod is then deleted from the API server. If the Kubelet or the container manager is restarted while waiting for processes to terminate, the termination will be retried with the full grace period.
The Kubernetes API server validates and configures data for the api objects which include pods, services, replicationcontrollers, and others. The API Server services REST operations and provides the frontend to the cluster’s shared state through which all other components interact.
For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.
Here you can find information about processing pod termination.
There are different probes:
For example for HTTP probe:
even if your app isn’t an HTTP server, you can create a lightweight HTTP server inside your app to respond to the liveness probe.
Command
For command probes, Kubernetes runs a command inside your container. If the command returns with exit code 0 then the container is marked as healthy.
More about probes and best practices.
Hope this help.

How to leverage Kubernetes readinessProbe to self heal the pod without restarting it?

According to this documentation, I see that readinessProbe can be used to temporarily halt requests to a pod without having to restart it in order to recover gracefully.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes
When I see pod events it looks like the pod is restarted upon Readiness probe failure. Here are the events:
1. Readiness probe failed
2. Created container
3. Started container
4. Killing container with id {}
Tried to modify container restartPolicy to OnFailure hoping this configuration decides pod action upon readinessProbe failure but I see the following error:
The Deployment {} is invalid: spec.template.spec.restartPolicy: Unsupported value: "OnFailure": supported values: "Always"
Which is the right way to stop requests to a pod without having to restart it and letting the application gracefully recover?
There are two type of probes.
Restarts happen due to failing liveness probes.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
liveness probe
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
readiness probe
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
Today I found a very good essay about probes https://blog.colinbreck.com/kubernetes-liveness-and-readiness-probes-how-to-avoid-shooting-yourself-in-the-foot/

Kubernetes default liveness and readiness probe

I wanted to know what kubernetes does to check liveness and readiness of a pod and container by default.
I could find the document which mentions how I can add my custom probe and change the probe parameters like initial delay etc. However, could not find the default probe method used by k8s.
By default, Kubernetes starts to send traffic to a pod when all the containers inside the pod start, and restarts containers when they crash. While this can be good enough when you are starting out, but you can make your deployment more robust by creating custom health checks.
By default, Kubernetes just checks container inside the pod is up and starts sending traffic. There is no by default readiness or liveness check provided by kubernetes.
Readiness Probe
Let’s imagine that your app takes a minute to warm up and start. Your service won’t work until it is up and running, even though the process has started. You will also have issues if you want to scale up this deployment to have multiple copies. A new copy shouldn’t receive traffic until it is fully ready, but by default Kubernetes starts sending it traffic as soon as the process inside the container starts. By using a readiness probe, Kubernetes waits until the app is fully started before it allows the service to send traffic to the new copy.
Liveness Probe
Let’s imagine another scenario where your app has a nasty case of deadlock, causing it to hang indefinitely and stop serving requests. Because the process continues to run, by default Kubernetes thinks that everything is fine and continues to send requests to the broken pod. By using a liveness probe, Kubernetes detects that the app is no longer serving requests and restarts the offending pod.
TL/DR: there is no default readiness probe ("should I send this pod traffic?") and the default liveness probe ("should I kill this pod?") is just whether the container is still running.
Kubernetes will not do anything on its own. You will have to decide what liveness and readiness mean to you. There are multiple things you can do, for example, an HTTP get request, issue a command, or connect on a port. It is up to you to decide how you want to make sure your users are happy and everything is running correctly.