Openshift + Readiness Check - kubernetes

On Openshift/Kubernetes, when a readiness check is configured with a HTTP GET with a path on for example a spring boot app with a service and route, is the HTTP GET request calling the Openshift service or route or something else and expecting 200-399?
Thanks,
B.

The kubernetes documentation on readiness and liveness probes states that
For an HTTP probe, the kubelet sends an HTTP request to the specified path and port to perform the check. The kubelet sends the probe to the pod’s IP address, unless the address is overridden by the optional host field in httpGet. [...] In most scenarios, you do not want to set the host field. Here’s one scenario where you would set it. Suppose the Container listens on 127.0.0.1 and the Pod’s hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1. If your pod relies on virtual hosts, which is probably the more common case, you should not use host, but rather set the Host header in httpHeaders.
So it uses the Pod's IP unless you set the host field on the probe. The service or ingress route are not used here because the readiness and liveness probes are used to decide if the service or ingress route should send traffic to the Pod.
The HTTP request comes from the Kubelet. Each kubernetes node runs the Kubelet process, which is responsible for node registration, and management of pods. The Kubelet is also responsible for watching the set of Pods that are bound to its node and making sure those Pods are running. It then reports back status as things change with respect to those Pods.
When talking about the HTTP probe, the docs say that
Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

Correct, it is using a webhook to determine if the container is ready to serve requests or not. By default the request is made to the Pod IP directly since when it fails, the container IP is removed from all endpoints for all services. This can be overridden by the host filed in the probe definition.
Any response code from 200-399 is considered a success as you have mentioned.

Related

Kubernetes: Readiness Check with httpGet

I am quite confused about readiness probe. Suppose I use httpGet with /health as the probing endpoint. Once the readiness check returns 500, the server will stop serving traffic. Then how can the /health endpoint work? In other words, once a readiness check fails, how can it ever work again since it can no longer answer to future /health checks?
I guess one valid explanation is that the path is invoked locally? (i.e. not through the https:${ip and port}/health)
You have typo.. you said :
Once the readiness check returns 500, the server will stop serving traffic.
However, it should be :
Once the readiness check returns 500, the k8s service will stop serving traffic.
k8s service behaves like a load balancer for multi-pods.
If pod is ready, an endpoint will be created for the ready pod, and the traffic will be received.
If pod is not ready, its endpoint will be removed and it will not more receive traffic.
While Readiness Probe decides to forward traffic or not, Liveness Probe decides to restart the Pod or not.
If you want to get rid off unhealthy Pod, you have to specify also Liveness Probe.
So let's summarize:
To get full HA deployment you need 3 things together:
Pod are managed by Deployment which will maintain a number of replicas.
Liveness Probe will help to remove/restart the unlheathy pod.. After somtime ( 6 restarts), the Pod will be unhealthy and the Deployment will take care to bring new one.
Readiness Probe will help forward traffic to only ready pods : Either at beginning of run, or at the end of run ( graceful shutdown).

How does a Liveness/Readiness probe communicate with a pod?

I am very new to k8s so apologies if the question doesn't make sense or is incorrect/stupid.
I have a liveness probe configured for my pod definition which just hits a health API and checks it's response status to test for the liveness of the pod.
My question is, while I understand the purpose of the liveness/readiness probes...what exactly are they? Are they just another type of pods which are spun up to try and communicate with our pod via the configured API? Or are they some kind of a lightweight process which runs inside the pod itself and attempts the API call?
Also, how does a probe communicate with a pod? Do we require a service to be configured for the pod so that the probe is able to access the API or is it an internal process with no additional config required?
Short answer: kubelet handle this checks to ensure your service is running, and if not it will be replaced by another container. Kubelet runs in every node of your cluster, you don't need to make any addional configurations.
You don't need to configure a service account to have the probes working, it is a internal process handled by kubernetes.
From Kubernetes documentation:
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.
For network probes, they are run from the kubelet on the node where the pod is running. Exec probes are run via the same mechanism as kubectl exec.

Pod got CrashLoopBackOff in Kubernetes because of GCP service account

After deployment with using helm carts, I got CrashLoopBackOff error.
NAME READY STATUS RESTARTS AGE
myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m
Then, I describe the pod to see events and I saw smth like below
Liveness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Readiness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Lastly, I saw invalid grant access to my GCP cloud proxy in logs as below
time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\n \"error\": \"invalid_grant\",\n \"error_description\": \"Not a valid email or user ID.\"\n}"
However, I checked my service account in IAM, it has access to cloud proxy. Furthermore, I tested with using same credentials in my local, and endpoint for readiness probe was working successfully.
Does anyone has any suggestion about my problem?
You can disable liveness probe to stop CrashLoopBackoff, exec into container and test from there.
Ideally you should not keep save config for liveness and readiness probe.It is not advisable for liveness probe to depend on anything external, it should just check if pod is live or not.
Referring to problem with granting access on GCP - fix this by using Email Address (the string that ends with ...#developer.gserviceaccount.com) instead of Client ID for client_id parameter value. The naming set by Google is confusing.
More information and troubleshooting you can find here: google-oautgh-grant.
Referring to problem with probes:
Check if URL is health. Your Probes may be too sensitive - your application take a while to start or respond.
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Liveness probe checks if your application is in a healthy state in your already running pod.
Readiness probe will actually check if your pod is ready to receive traffic. Thus, if there is no /path endpoint, it will never appear as Running
egg:
livenessProbe:
httpGet:
path: /your-path
port: 5000
failureThreshold: 1
periodSeconds: 2
initialDelaySeconds: 2
ports:
- name: http
containerPort: 5000
If endpoint /index2 will not exist pod will never appear as Running.
Make sure that you properly set up liveness and readiness probe.
For an HTTP probe, the kubelet sends an HTTP request to the specified
path and port to perform the check. The kubelet sends the probe to the
pod’s IP address, unless the address is overridden by the optional
host field in httpGet. If scheme field is set to HTTPS, the kubelet
sends an HTTPS request skipping the certificate verification. In most
scenarios, you do not want to set the host field. Here’s one scenario
where you would set it. Suppose the Container listens on 127.0.0.1
and the Pod’s hostNetwork field is true. Then host, under httpGet,
should be set to 127.0.0.1. Make sure you did it. If your pod relies
on virtual hosts, which is probably the more common case, you should
not use host, but rather set the Host header in httpHeaders.
For a TCP probe, the kubelet makes the probe connection at the node,
not in the pod, which means that you can not use a service name in the
host parameter since the kubelet is unable to resolve it.
Most important thing you need to configure when using liveness probes. This is the initialDelaySeconds setting.
Make sure that you do have port 80 open on the container.
Liveness probe failure causes the pod to restart. You need to make sure the probe doesn’t start until the app is ready. Otherwise, the app will constantly restart and never be ready!
I recommend to use p99 startup time for the initialDelaySeconds.
Take a look here: probes-kubernetes, most-common-fails-kubernetes-deployments.

Traefik health checks via kubernetes annotation

I want setup Traefik backend health check via Kubernetes annotation, but looks like Kubernetes Ingress does not support that functionality according to official documentation.
Is any particular reason why Traefik does not support that functionality for Kubernetes Ingress? I'm wondering because Mesos support health checks for a backend.
I know that in Kubernetes you can configure readiness/liveness probe for the pods, but I have leader/follower fashion service, so Traefik should route the traffic only to the leader.
UPD:
The only leader can accept the connection from Traefik; a follower will refuse the connection.
I have two readiness checks in my mind:
Service is up and running, and ready to be elected as a leader (kubernetes readiness probe)
Service is up and running and promoted as a leader (traefik health check)
Traefik relies on Kubernetes to provide an indication of the health of the underlying pods to ascertain whether they are ready to provide service. Kubernetes exposes two mechanisms in a pod to communicate information to the orchestration layer:
Liveness checks to provide an indication to Kubernetes when the process(es) running in the pod have transitioned to a broken state. A failing liveness check will cause Kubernetes to destroy the pod and recreate it.
Readiness checks to determine when a pod is ready to provide service. A failing readiness check will cause the Endpoint Controller to remove the pod from the list of endpoints of any services it provides. However, it will remain running.
In this instance, you would expose information to Traefik via a readiness check. Configure your pods with a readiness check which fails if they are in a state in which they should not receive any traffic. When the readiness state changes, Kubernetes will update the list of endpoints against any services which route traffic to the pod to add or remove the pod. Traefik will accordingly update its view of the world to add or remove the pod from the list of endpoints backing the Ingress.
There is no reason for this model to be incompatible with your master/follower architecture, provided each pod can ascertain whether it is the master or follower and provide an appropriate indication in its readiness check. However, without taking special care, there will be races between the master/follower state changing and Kubernetes updating its endpoints, as readiness probes are only made periodically. I recommend assuming this will be the case and building-in logic to reject requests received by non-master pods.
As a future consideration to increase robustness, you might split the ingress layer of your service from the business logic implementing the master/follower system, allowing all instances to communicate with Traefik and enqueue work for consideration by whatever is the "master" node at this point.

How to prevent a pod to be added to a kube-service unitl initialization complete

Sometimes a pod should take some times to "warmup"(like load some data to cache). At that time it should not be exposed.
How to prevent a pod to be added to a kube-service unitl initialization complete?
You should use health checks. More specifically in Kubernetes, you need a ReadinessProbe
ReadinessProbe: indicates whether the container is ready to service requests. If the ReadinessProbe fails, the endpoints controller will remove the pod’s IP address from the endpoints of all services that match the pod. The default state of Readiness before the initial delay is Failure. The state of Readiness for a container when no probe is provided is assumed to be Success.
Also, difference from LivenessProbe:
If you’d like to start sending traffic to a pod only when a probe succeeds, specify a ReadinessProbe. In this case, the ReadinessProbe may be the same as the LivenessProbe, but the existence of the ReadinessProbe in the spec means that the pod will start without receiving any traffic and only start receiving traffic once the probe starts succeeding.
http://kubernetes.io/docs/user-guide/pod-states/