Pod got CrashLoopBackOff in Kubernetes because of GCP service account

Pod got CrashLoopBackOff in Kubernetes because of GCP service account - kubernetes

After deployment with using helm carts, I got CrashLoopBackOff error.
NAME READY STATUS RESTARTS AGE
myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m
Then, I describe the pod to see events and I saw smth like below
Liveness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Readiness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Lastly, I saw invalid grant access to my GCP cloud proxy in logs as below
time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\n \"error\": \"invalid_grant\",\n \"error_description\": \"Not a valid email or user ID.\"\n}"
However, I checked my service account in IAM, it has access to cloud proxy. Furthermore, I tested with using same credentials in my local, and endpoint for readiness probe was working successfully.
Does anyone has any suggestion about my problem?

You can disable liveness probe to stop CrashLoopBackoff, exec into container and test from there.
Ideally you should not keep save config for liveness and readiness probe.It is not advisable for liveness probe to depend on anything external, it should just check if pod is live or not.

Referring to problem with granting access on GCP - fix this by using Email Address (the string that ends with ...#developer.gserviceaccount.com) instead of Client ID for client_id parameter value. The naming set by Google is confusing.
More information and troubleshooting you can find here: google-oautgh-grant.
Referring to problem with probes:
Check if URL is health. Your Probes may be too sensitive - your application take a while to start or respond.
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Liveness probe checks if your application is in a healthy state in your already running pod.
Readiness probe will actually check if your pod is ready to receive traffic. Thus, if there is no /path endpoint, it will never appear as Running
egg:
livenessProbe:
httpGet:
path: /your-path
port: 5000
failureThreshold: 1
periodSeconds: 2
initialDelaySeconds: 2
ports:
- name: http
containerPort: 5000
If endpoint /index2 will not exist pod will never appear as Running.
Make sure that you properly set up liveness and readiness probe.
For an HTTP probe, the kubelet sends an HTTP request to the specified
path and port to perform the check. The kubelet sends the probe to the
pod’s IP address, unless the address is overridden by the optional
host field in httpGet. If scheme field is set to HTTPS, the kubelet
sends an HTTPS request skipping the certificate verification. In most
scenarios, you do not want to set the host field. Here’s one scenario
where you would set it. Suppose the Container listens on 127.0.0.1
and the Pod’s hostNetwork field is true. Then host, under httpGet,
should be set to 127.0.0.1. Make sure you did it. If your pod relies
on virtual hosts, which is probably the more common case, you should
not use host, but rather set the Host header in httpHeaders.
For a TCP probe, the kubelet makes the probe connection at the node,
not in the pod, which means that you can not use a service name in the
host parameter since the kubelet is unable to resolve it.
Most important thing you need to configure when using liveness probes. This is the initialDelaySeconds setting.
Make sure that you do have port 80 open on the container.
Liveness probe failure causes the pod to restart. You need to make sure the probe doesn’t start until the app is ready. Otherwise, the app will constantly restart and never be ready!
I recommend to use p99 startup time for the initialDelaySeconds.
Take a look here: probes-kubernetes, most-common-fails-kubernetes-deployments.

Related

How to set basic auth to Kubernetes' readinessProbe correctly?

How to set basic auth to Kubernetes' readinessProbe correctly?
If set this config for Kubernetes' readinessProbe in deployment kind.
readinessProbe:
httpGet:
path: /healthcheck
port: 8080
httpHeaders:
- name: Authorization
value: Basic <real base64 encoded data>
Deploy it to GKE, GCP's health check can't pass to access the inside application with using basic authentication.
But from here, it seems it should use this syntax. Why can't pass?
The server side is using JSON response at the /healthcheck point. Is it also necessary to set Accept or Content-Type to the httpHeaders?
And, is it good to set this health check to livenessProbe or readinessProbe?

According to kubernetes doc:
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy. If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-liveness-probe
If you'd like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe. In this case, the readiness probe might be the same as the liveness probe, but the existence of the readiness probe in the spec means that the Pod will start without receiving any traffic and only start receiving traffic after the probe starts succeeding. If your container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-readiness-probe
So, you can use health check according to your need. But in kubernetes doc, they give an example of health check as a liveliness probe.
Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request
And it is the best practice to use Content-Type when you send request other then browser or other typical client. I believe using Content-Type: application/json will solve the issue if other things go right in server side.

Liveness-Probe of one pod via another

On my Kubernetes Setup, I have 2 pods - A (via deployment) and B(via DS).
Pod B is somehow dependent on Pod A being fully started through. I would now like to set an HTTP Liveness-Probe in Pods B, to restart POD B if health check via POD A fails. Restarting works fine if I put the External IP of my POD A's service in the host. The issue is in resolving DNS name in the host.
It works if I set it like this:
livenessProbe:
httpGet:
host: <POD_A_SERVICE_EXTERNAL_IP_HERE>
path: /health
port: 8000
Fails if I set it like this:
livenessProbe:
httpGet:
host: auth
path: /health
port: 8000
Failed with following error message:
Liveness probe failed: Get http://auth:8000/health: dial tcp: lookup auth on 8.8.8.8:53: no such host
ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Is the following line on the above page true for HTTP Probes as well?
"you can not use a service name in the host parameter since the kubelet is unable to resolve it."

Correct 👍, DNS doesn't work for liveness probes, the kubelet network space cannot basically resolve any in-cluster DNS.
You can consider putting both of your services in a single pod as sidecars. This way they would share the same address space if one container fails then the whole pod is restarted.
Another option is to create an operator 🔧 for your pods/application and basically have it check the liveness through the in-cluster DNS for both pods separately and restart the pods through the Kubernetes API.
You can also just create your own script in a pod that just calls curl to check for a 200 OK and kubectl to restart your pod if you get something else.
Note that for the 2 options above you need to make sure that Coredns is stable and solid otherwise your health checks might fail to make your services have potential downtime.
✌️☮️

How does a Liveness/Readiness probe communicate with a pod?

I am very new to k8s so apologies if the question doesn't make sense or is incorrect/stupid.
I have a liveness probe configured for my pod definition which just hits a health API and checks it's response status to test for the liveness of the pod.
My question is, while I understand the purpose of the liveness/readiness probes...what exactly are they? Are they just another type of pods which are spun up to try and communicate with our pod via the configured API? Or are they some kind of a lightweight process which runs inside the pod itself and attempts the API call?
Also, how does a probe communicate with a pod? Do we require a service to be configured for the pod so that the probe is able to access the API or is it an internal process with no additional config required?

Short answer: kubelet handle this checks to ensure your service is running, and if not it will be replaced by another container. Kubelet runs in every node of your cluster, you don't need to make any addional configurations.
You don't need to configure a service account to have the probes working, it is a internal process handled by kubernetes.
From Kubernetes documentation:
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container. There are three types of handlers:
ExecAction: Executes a specified command inside the Container. The diagnostic is considered successful if the command exits with a status code of 0.
TCPSocketAction: Performs a TCP check against the Container’s IP address on a specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction: Performs an HTTP Get request against the Container’s IP address on a specified port and path. The diagnostic is considered successful if the response has a status code greater than or equal to 200 and less than 400.
Each probe has one of three results:
Success: The Container passed the diagnostic.
Failure: The Container failed the diagnostic.
Unknown: The diagnostic failed, so no action should be taken.
The kubelet can optionally perform and react to three kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success.

For network probes, they are run from the kubelet on the node where the pod is running. Exec probes are run via the same mechanism as kubectl exec.

Readiness-Probe another Service on boot-up of Pod

On my Kubernetes Setup, i have 2 Services - A and B.
Service B is dependent on Service A being fully started through.
I would now like to set a TCP Readiness-Probe in Pods of Service B, so they test if any Pod of Service A is fully operating.
the ReadinessProbe section of the deployment in Service B looks like:
readinessProbe:
tcpSocket:
host: serviceA.mynamespace.svc.cluster.local
port: 1101 # same port of Service A Readiness Check
I can apply these changes, but the Readiness Probe fails with:
Readiness probe failed: dial tcp: lookup serviceB.mynamespace.svc.cluster.local: no such host
I use the same hostname on other places (e.g. i pass it as ENV to the container) and it works and gets resolved.
Does anyone have an idea to get the readiness working for another service or to do some other kind of dependency-checking between services?
Thanks :)

Due to the fact that Readiness and Liveness probes are fully managed by kubelet node agent and kubelet inherits DNS discovery service from the particular Node configuration, you are not able to resolve K8s internal nameserver DNS records:
For a probe, the kubelet makes the probe connection at the node, not
in the pod, which means that you can not use a service name in the
host parameter since the kubelet is unable to resolve it.
You can consider scenario when your source Pod A consumes Node IP Address by propagating hostNetwork: true parameter, thus kubelet can reach and success Readiness probe from within Pod B, as described in the official k8s documentation:
tcpSocket:
host: Node Hostname or IP address where Pod A residing
port: 1101
However, I've found Stack thread, where you can get more efficient solution how to achieve the same result through Init Containers.

In addition to Nick_Kh's answer, another workaround is to use probe by command, which is executed in a container.
To perform a probe, the kubelet executes the command cat /tmp/healthy in the target container. If the command succeeds, it returns 0, and the kubelet considers the container to be alive and healthy.
An example:
readinessProbe:
exec:
command:
- sh
- -c
- wget -T2 -O- http://service

k8s - livenessProbe vs readinessProbe

Consider a pod which has a healthcheck setup via a http endpoint /health at port 80 and it takes almost 60 seconds to be actually ready & serve the traffic.
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 60
livenessProbe:
httpGet:
path: /health
port: 80
Questions:
Is my above config correct for the given requirement?
Does liveness probe start working only after the pod becomes ready ? In other words, I assume readiness probe job is complete once the POD is ready. After that livenessProbe takes care of health check. In this case, I can ignore the initialDelaySeconds for livenessProbe. If they are independent, what is the point of doing livenessProbe check when the pod itself is not ready! ?
Check this documentation. What do they mean by
If you want your Container to be able to take itself down for
maintenance, you can specify a readiness probe that checks an endpoint
specific to readiness that is different from the liveness probe.
I was assuming, the running pod will take itself down only if the livenessProbe fails. not the readinessProbe. The doc says other way.
Clarify!

I'm starting from the second problem to answer. The second question is:
Does liveness probe start working only after the pod becomes ready?
In other words, I assume readiness probe job is complete once the POD
is ready. After that livenessProbe takes care of health check.
Our initial understanding is that liveness probe will start to check after readiness probe was succeeded but it turn out not to be like that. It has opened an issue for this challenge.Yon can look up to here. Then It was solved this problem by adding startup probes.
To sum up:
livenessProbe
livenessProbe: Indicates whether the Container is running. If the
liveness probe fails, the kubelet kills the Container, and the
Container is subjected to its restart policy. If a Container does not
provide a liveness probe, the default state is Success.
readinessProbe
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
startupProbe
startupProbe: Indicates whether the application within the Container is started. All other probes are disabled if a startup probe is provided, until it succeeds. If the startup probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a startup probe, the default state is Success
look up here.

The liveness probes are to check if the container is started and alive. If this isn’t the case, kubernetes will eventually restart the container.
The readiness probes in turn also check dependencies like database connections or other services your container is depending on to fulfill it’s work. As a developer you have to invest here more time into the implementation than just for the liveness probes. You have to expose an endpoint which is also checking the mentioned dependencies when queried.
Your current configuration uses a health endpoint which are usually used by liveness probes. It probably doesn’t check if your services is really ready to take traffic.
Kubernetes relies on the readiness probes. During a rolling update, it will keep the old container up and running until the new service declares that it is ready to take traffic. Therefore the readiness probes have to be implemented correctly.

I will show the difference between them in a couple of simple points:
livenessProbe
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container has started and is alive or not i.e. proof of being available.
In the given example, if the request fails, it will restart the container.
If not provided the default state is Success.
readinessProbe
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
It is used to indicate if the container is ready to serve traffic or not i.e.proof of being ready to use.
It checks dependencies like database connections or other services your container is depending on to fulfill its work.
In the given example, until the request returns Success, it won't serve any traffic(by removing the Pod’s IP address from the endpoints of all Services that match the Pod).
Kubernetes relies on the readiness probes during rolling updates, it keeps the old container up and running until the new service declares that it is ready to take traffic.
If not provided the default state is Success.
Summary
Liveness Probes: Used to check if the container is available and alive.
Readiness Probes: Used to check if the application is ready to be used and serve the traffic.

Both readiness probe and liveness probe seem to have same behavior. They do same type of checks. But the action they take in case of failures is different.
Readiness Probe shuts the traffic from service down. so that service can always the send the request to healthy pod whereas the liveness probe restarts the pod in case of failure. It does not do anything for the service. Service continues to send the request to the pods as usual if it is in ‘available’ status.
It is recommended to use both probes!!
Check here for detailed explanation with code samples.

The Kubernetes platform has capabilities for validating container applications, called healthchecks. Liveness is proof of availability and readness is proof of pod readiness is ready to use.
The features are designed to prevent service downtime and inconsistent images by enabling restarts when needed. Kubernetes uses liveness to know when to restart the container, so it can solve most problems. Kubernetes uses readness to know when the container is available to accept requests. The pod is considered ready when all containers are ready. Therefore, when the pod takes too long to initialize (by cache mount, DB schema, etc.) it is recommended to increase initialDelaySeconds.

I'd post it as a comment but it's too long, So let's make it a full answer.
Is my above config correct for the given requirement?
IMHO no, you are missing initialDelaySeconds for both probes and liveness and rediness probably should not call the same endpoint. I'd use the suggestionss form #fgul
Does liveness probe start working only after the pod becomes ready ?
In other words, I assume readiness probe job is complete once the POD
is ready. After that livenessProbe takes care of health check. In this
case, I can ignore the initialDelaySeconds for livenessProbe. If they
are independent, what is the point of doing livenessProbe check when
the pod itself is not ready! ?
I think you were thinking about startupProbe, again #fgul described what does what so there is no point in me repeating.
I was assuming, the running pod will take itself down only if the
livenessProbe fails. not the readinessProbe. The doc says other way.
The pod can be restarted only based on livenessProbe, not the redinessProbe.
I'd think twice before binding a rediness probe with external services (being alive as #randy advised), especially in high load services:
Let's assume you have define a deployment with lots of pods, that are connecting to a database and are processing lots of requests.
Now the database goes down.
The rediness probe is checking also db connection and it marks all of the pods as "out of service".
Now the db goes up.
Pods rediness probe will start to pass but not instantly and on all pods right away - the pods will be marked as "Ready" one after an other.
But it might be too slow - the second the first pod will be marked as ready, ALL of the traffic will be sent to this one pod alone. It might end in a situation that the "waking up" pods will be killed by the traffic one after an other.
For that kind of situation I'd say the rediness pod should check only pod internal stuff and don't care about the externall services. The kubernetes endpoint will return an error and either the clients might support failing service (it's called "designed for failure") or the loadbalancer/ingress can cover it.

I think the below image describes the use-cases for each.

Liveness probes are a relatively specialized tool, and you probably don't want one at all. However they run totally independently AFAIK.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse