Openshift: trigger pods restart sequentially - kubernetes

My application loads data during startup. So I need to restart application to change data.
Data is loaded from Oracle schema and can be changed by other application.
If the data is changed, application becomes partially functional and needs to be restarted.
Requirement: restart should be done automatically without downtime (old pod should be killed, when new one pass readiness check).
How this requirement can be fulfilled?
Notes:
I would really like to use liveness probe to check some URL with health check. Issue: AFAIK liveness probe kills pod as soon as check fails. So all
pods will be killed simultaneously, which leads to a downtime
during startup.
The desired behavior can be reached by a rolling deployment. However I don't want to perform it manually.
I don't want to implement loading data during pod operation for simplicity: it can load data only during startup. If pod state is not fully functional, it is killed and recreated.

2 ways i can think of :
- Use statefulsets, the pods will be restarted in order and killed in reverse order.
- You can use deployment's spec.strategy.type = RollingUpgrade and pair it with maxUnavailable to greater than 1.
.spec.strategy.rollingUpdate.maxUnavailable

Related

Does it make sense to have a longer `initialDelaySeconds` in case of readiness probes?

Based on the documentation initialDelaySeconds gives some delay before the first readiness probe is checking the pod. But the only effect of readiness probe failure is that the pod is marked unready: Does not get traffic from services and also affects the deployment state.
So giving readiness checks some delay is not really effective for the majority of the applications: We want to make the pod early as soon as we can, meaning we need to check if it's ready as soon as possible.
Things to consider:
No reason to set it earlier than the earliest possible startup time.
If you set it late, you are wasting resources (pods not receiving traffic): 1 minute delay for 60 pods is 1 hour.
How much resource does the readiness Probe consume? Does it make external calls (Database, REST, etc - IMHO this should be avoided)?
Can the pod serve the traffic? Are all the necessary connections (DB, REST) established, caches populated? - No reason to accept traffic if the pod cannot connect to the database/backend service.
To summarize:
You want to minimize startup time
You want to minimize readiness calls
Measure it, set it to the earliest possible time a pod can start.
If your pods are failing the readiness regularly causing restarts, increase it.
So giving readiness checks some delay is not really effective for the majority of the applications: We want to make the pod early as soon as we can, meaning we need to check if it's ready as soon as possible.
It depends what application you use. As you can read what readiness probes is:
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
If your application requires loading of large data files or configuration files at startup and it always takes e.g. 30 seconds, it makes no sense to start checking immediately after starting if the application is ready to run, because it will not be ready.
Therefore the initialDelaySeconds option is suitable for this and we can set the checking to start e.g. from 20 seconds instead of immediately
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.

What are the conditions for a pod to be treated as failed

I am aware of how replicaset works and how it will reconcile the state from its specification .
However, I am not completely aware of what are all the criteria Replicaset uses for it it to reconcile the state ?
I happened to take look the documentation to understand the scenarios.
https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
One scenarios is when the pod is down for any reason - application issue
Node is down
What are all the other scenarios ? If the pod is stuck in making progress, will replica set take care of that ? Or is it just check whether the pod is alive or not ?
If the pod is stuck in making progress, will replica set take care of that ?
As long as the main process inside of a container is running, it is considered healthy by default and it will be treated as such. If there is an application issue which prevents your application from working correctly but the main process is still running, you will be stuck with an "unhealthy" pod.
That is the reason why you want to implement livenessProbe for your containers and specify what "behavior" represents a healthy state of the container. In such scenario, failure to successfully respond to health check multiple times (configurable) will result in a container being treated as failed and your replica set will take an action.
Example might be a simple HTTP GET request to some predefined path if you are running web application in your pod (eg /api/health). Now, even if main process is running, your application needs to periodically respond to this health-check query otherwise it will be replaced.
If the Pod or the Node is not down, the Pod will only fail and a new one will be created if you have Liveness Probe defined.
If you don't have it implemented, k8s will never know that your Pod is not up and running.
Take a look at this doc page for more info.
OOM Killed issue - which cause pod kill and restart pod
Cpu limit issue - This cause 404 issue but do not restart pod

Pod is still running after I delete the parent job

I created a job in my kubernetes cluster, the job takes a long time to finish, I decided to cancel it, so I deleted the job, but I noticed the associated pod is NOT automatically deleted. Is this the expected behavior? why is it not consistent with deployment deletion? Is there a way to make pod automatically deleted?
If you're deleting a deployment, chances are you don't want any of the underlying pods, so it most likely forcefully deletes the pods by default. Also, the desired state of pods would be unknown.
On the other hand if you're deleting a pod, it doesn't know what kind of replication controller may be attached to it and what it is doing next. So it signals a shutdown to the container so that it can perhaps clean up gracefully. There may be processes that are still using the pod, like a web request etc. and it would not be good to kill their request if it may take a second to complete. This is what happens if you may be scaling up your pods or rolling out a new deployment, and you don't want any of the users to experience any downtime. This is in fact one of the benefits of Kubernetes, as opposed to a traditional application server which requires you to shutdown the system to upgrade (or to play with load balancers to redirect traffic) which may negatively affect users.

Is container where liveness or readiness probes's config are set to a "pod check" container?

I'm following this task Configure Liveness, Readiness and Startup Probes
and it's unclear to me whether a container where the check is made is a container only used to check the availability of a pod? Because it makes sense if pod check container fails therefore api won't let any traffic in to the pod.
So a health check signal must be coming from container where some image or app runs? (sorry, another question)
From the link you provided it seems like they are speaking about Containers and not Pods so the probes are meant to be per containers. When all containers are ready the pod is described as ready too as written in the doc you provided :
The kubelet uses readiness probes to know when a Container is ready to
start accepting traffic. A Pod is considered ready when all of its
Containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
So yes, every containers that are running some images or apps are supposed to expose those metrics.
Livenes and readiness probes as described by Ko2r are additional checks inside your containers and verified by kubelet according to the settings fro particular probe:
If the command (defined by health-check) succeeds, it returns 0, and the kubelet considers the Container to be alive and healthy. If the command returns a non-zero value, the kubelet kills the Container and restarts it.
In addition:
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
Fro another point of view:
Pod is a top-level resource in the Kubernetes REST API.
As per docs:
Pods are ephemeral. They are not designed to run forever, and when a Pod is terminated it cannot be brought back. In general, Pods do not disappear until they are deleted by a user or by a controller.
Information about controllers can find here:
So the best practise is to use controllers like describe above. You’ll rarely create individual Pods directly in Kubernetes–even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a Controller), it is scheduled to run on a Node in your cluster. The Pod remains on that Node until the process is terminated, the pod object is deleted, the Pod is evicted for lack of resources, or the Node fails.
Note:
Restarting a container in a Pod should not be confused with restarting the Pod. The Pod itself does not run, but is an environment the containers run in and persists until it is deleted
Because Pods represent running processes on nodes in the cluster, it is important to allow those processes to gracefully terminate when they are no longer needed (vs being violently killed with a KILL signal and having no chance to clean up). Users should be able to request deletion and know when processes terminate, but also be able to ensure that deletes eventually complete. When a user requests deletion of a Pod, the system records the intended grace period before the Pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container. Once the grace period has expired, the KILL signal is sent to those processes, and the Pod is then deleted from the API server. If the Kubelet or the container manager is restarted while waiting for processes to terminate, the termination will be retried with the full grace period.
The Kubernetes API server validates and configures data for the api objects which include pods, services, replicationcontrollers, and others. The API Server services REST operations and provides the frontend to the cluster’s shared state through which all other components interact.
For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.
Here you can find information about processing pod termination.
There are different probes:
For example for HTTP probe:
even if your app isn’t an HTTP server, you can create a lightweight HTTP server inside your app to respond to the liveness probe.
Command
For command probes, Kubernetes runs a command inside your container. If the command returns with exit code 0 then the container is marked as healthy.
More about probes and best practices.
Hope this help.

How to reduce the "unhealthy" delay during pod startup?

I am using kubernetes to start java pods. The pod startup delay vary between 10 seconds and about a minute depending on the load of the node, the time flyway took to migrate the tables, ...
To avoid having kubernetes killing the pods that are starting we set the liveness probe with an initial delay of two minutes.
It saves us from pods being eternally killed because they start too slowly. But in case of scaling up, crash recovery, we loose a couple of seconds / minutes before the freshly started pod join the service.
Is there any way to optimize that ?
A way to tell kubernetes "we are live, you can start using the liveness probe" before the initial delay ?
For starters, this will not happen at all. Liveness probe does not control how pods are joined to the service, as you stated, it will restart container if it fails to satisfy the probe, but it will not make the service await for successfull liveness probe before it is added as a service endpoint. For that you have a separate readiness probe. So this should be a no issue for you (btw. you might want to use both readiness and liveness probes to get optimal process)
I think you need to reduce the work the container is doing.
You mention the database migrations. It may be better to supply them as a one time job to Kubernetes and not trying to run them at every start. Effectively for a certain version of your software you only do them once and each subsequent start still has to do the work of checking if the database schema is already up to date.