Hazelcast failes liveness probe in OpenShift when loading a large map - kubernetes

We have Hazelcast 4.2 on OpenShift deployed as a standalone cluster and stateful set. We use Mongo as a backing data store (it shouldn't matter) and the docker image is created with a copy of dockerfile from Github Hazelcast project with all the package repositories replaced with our internal company servers.
We have a MapLoader which takes a long time (30 minutes) to load all the data. During this load time the cluster fails to respond to liveness and readiness probes:
Readiness probe failed: Get http://xxxx:5701/hazelcast/health/node-state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
and so OpenShift kills the nodes that are loading the data.
Is there any way to fix it, so Hazelcast responds with "alive" and "not ready" instead of timing out the connection?

You can increase the time for POD to check the health of POD if you are loading data initially.
https://github.com/hazelcast/charts/blob/d3b8d118da400255effc81e67a13c1863fee5f41/stable/hazelcast/values.yaml#L120
Above is helm example line showing the readiness and liveness configuration.
You can change the initialDelaySeconds to wait and after that seconds only Hazelcast will start checking the health.
Accordingly you can also adjust the other probes configuration like failureThreshold: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
if initialDelaySeconds is not helpful, still you can increase the configuration for the readiness & liveness accordingly and increase the timeout whenever map loader loading the data.
Also only in liveness your POD or container get restart if Readiness failing K8s will mark your POD as not ready to accept traffic. So while loading data you can increase the liveness configuration so it never failed and set to max possible.
The kubelet uses liveness probes to know when to restart a container.
For example, liveness probes could catch a deadlock, where an
application is running, but unable to make progress. Restarting a
container in such a state can help to make the application more
available despite bugs.
The kubelet uses readiness probes to know when a container is ready to
start accepting traffic. A Pod is considered ready when all of its
containers are ready. One use of this signal is to control which Pods
are used as backends for Services. When a Pod is not ready, it is
removed from Service load balancers.
You can also set the restartPolicy to Never so the container will not restart.

Related

In Kubernetes, why does readinessProbe have the option periodSeconds?

I understand what a readinessProbe does, but I don't see why it should have a periodSeconds. Once it's determined that the pod is ready, it should stop checking. Wouldn't checking periodically then be up to the livenessProbe? Or am I missing something?
ReadinesProbe and livenessProbe serve for different purposes.
ReadinessProbe checks if a service is ready to serve requests. If the readinessProbe fails, the container will be taken out of service for as long as the probe fails. If the readinessProbe reports up again, the container will be taken back into service again and receive requests.
In contrast, if the livenessProbe fails, it will be considered as not recoverable and the container will be terminated and restarted.
For both, periodSeconds makes sense. Even for the livenessProbe, when failure is considered only after X consecutive failed checks.
Readiness probes determine whether or not a container is ready to serve requests. If the readiness probe returns a failed state, then Kubernetes removes the IP address for the container from the endpoints of all Services.
We use readiness probes to instruct Kubernetes that a running container should not receive any traffic. This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
The readiness probe is configured in the spec.containers.readinessprobe attribute of the pod configuration.
This periodSeconds field specifies that the kubelet should perform a readiness probe for every “x” seconds that is mentioned in the yaml. This Specifies the frequency of the checks to the readiness probe to check. Default to 10 seconds. Minimum value is 1.

Kubernetes how to block new traffic to one replica?

I've got application with 10 pods and traffic is load balanced between all pods. There was an issue that caused transactions queued up and few pods could not recover properly or took a long time to process the queue once the issue was fixed. The new traffic was still too much for some of the pods.
I'm wondering if I can block new traffic to particular pod(s) in a replicaset and let them process the queue and once the queue is processed then let the new traffic come in again?
For that you can use the probe to handle this scenario
A Readiness probe is one way to do it.
What probes to do is, continuously check inside the container or POD for the process is up or not on configured time interval.
Example
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
you can create the endpoint into the application which will be checked by the K8s automatically and if K8s will 200 it will mark the POD as Ready to handle the traffic. Or else mark as Unready not to handle traffic.
Note :
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
The Readiness probe won't restart your POD if it's failing, while the liveness probe will restart your POD or container if it's failing and sending 400.
In your scenario, it's better to use the Readiness probe, so the process keeps running and never gets restarted. Once application ready to handle traffic K8s will get the 200 responses on endpoint.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Is readiness probe in kubernetes event after the POD is RUNNING state?

Is readiness probe run even after the POD is ready ?
Will it run even after the POD is in RUNNING state?
Is readiness probe run even after the POD is ready ?
Yes
Will it run even after the POD is in RUNNING state?
Yes
As official documentation is saying, there are some cases when:
applications are temporarily unable to serve traffic... an application might depend on external services ... In such cases, you don't want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
So, liveness probe is to detect and remedy situationswhen App cannot recover except by being restarted.
Readiness probe is used to detect situation when traffic shall not be sent to App.
Both probes have the same set of settings as initialDelaySeconds, periodSeconds , etc.
Readiness probe checks if container available for incoming traffic. It is being constantly executed even when container gets ready.
Here are the docs:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

How does a Liveness/Readiness probe communicate with a pod?

I am very new to k8s so apologies if the question doesn't make sense or is incorrect/stupid.
I have a liveness probe configured for my pod definition which just hits a health API and checks it's response status to test for the liveness of the pod.
My question is, while I understand the purpose of the liveness/readiness probes...what exactly are they? Are they just another type of pods which are spun up to try and communicate with our pod via the configured API? Or are they some kind of a lightweight process which runs inside the pod itself and attempts the API call?
Also, how does a probe communicate with a pod? Do we require a service to be configured for the pod so that the probe is able to access the API or is it an internal process with no additional config required?
Short answer: kubelet handle this checks to ensure your service is running, and if not it will be replaced by another container. Kubelet runs in every node of your cluster, you don't need to make any addional configurations.
You don't need to configure a service account to have the probes working, it is a internal process handled by kubernetes.
From Kubernetes documentation:
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.
For network probes, they are run from the kubelet on the node where the pod is running. Exec probes are run via the same mechanism as kubectl exec.

why both liveness is needed with readiness

While doing health check for kubernetes pods, why liveness probe is needed even though we already maintain readiness probe ?
Readiness probe already keeps on checking if the application within pod is ready to serve requests or not, which means that the pod is live. But still, why liveness probe is needed ?
The probes have different meaning with different results:
failing liveness probes -> restart container
failing readiness probes -> do not send traffic to that pod
You can not determine liveness from readiness and vice-versa. Just because pod cannot accept traffic right know, doesn't mean restart is needed, it can mean that it just needs time to finish some work.
If you are deploying e.g. php app, those two will probably be the same, but k8s is a generic system, that supports many types of workloads.
From: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
Sidenote: Actually readiness should be a subset of liveness, that means readiness implies liveness (and failing liveness implies failing readiness). But that doesn't change the explanation above, because if you have only readiness, you can only imply when restart is NOT needed, which is the same as not having any probe for restarting at all. Also because of probes are defined separately there is no guarantee for k8s, that one is subset of the other
The readiness probe checks that your application is ready to serve the requests or not and it will not add that particular pod to ready pod pull until it satisfy readiness checks. The main difference is that it doesnt restart the pod if a pod is not ready.
Liveness probe checks if the pod is not ready(not satisfying specific condition) it will restart the pod in a hope that this pod will recover and becomes ready.
Kubernetes allows you to define few things to make the app available.
1: Liveness probes for your Container.
2: Readiness probes for your Pod.
1- Liveness probes: They help keep your apps healthy by ensuring unhealthy containers are restarted automatically.
2: Readiness probes They help by invoking periodically and determines whether the specific Pod should receive client requests or not.
Operation of Readiness Probes
When a container is started, Kubernetes can be configured to wait for a configurable amount of time to pass before performing the first readiness check. After that, it invokes the probe periodically and acts based on the result of the readiness probe.
If a pod reports that it’s not ready, it’s removed from the service. Just think like- The effect is the same as when the pod doesn’t match the service’s label selector at all.
If the pod then becomes ready again, it’s re-added.
Important difference between liveness and readiness probes
1: Unlike liveness probes(i:e as mentioned above "If a pod reports that it’s not ready, it’s removed from the service"), if a container fails the readiness check, it won’t be killed or restarted.
2: Liveness probes keep pods healthy by killing off unhealthy containers and replacing them with new, healthy ones, whereas readiness probes make sure that only pods that are ready to serve requests receive them
I hope this helps you better.