Liveness-Probe of one pod via another

Liveness-Probe of one pod via another - kubernetes

On my Kubernetes Setup, I have 2 pods - A (via deployment) and B(via DS).
Pod B is somehow dependent on Pod A being fully started through. I would now like to set an HTTP Liveness-Probe in Pods B, to restart POD B if health check via POD A fails. Restarting works fine if I put the External IP of my POD A's service in the host. The issue is in resolving DNS name in the host.
It works if I set it like this:
livenessProbe:
httpGet:
host: <POD_A_SERVICE_EXTERNAL_IP_HERE>
path: /health
port: 8000
Fails if I set it like this:
livenessProbe:
httpGet:
host: auth
path: /health
port: 8000
Failed with following error message:
Liveness probe failed: Get http://auth:8000/health: dial tcp: lookup auth on 8.8.8.8:53: no such host
ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Is the following line on the above page true for HTTP Probes as well?
"you can not use a service name in the host parameter since the kubelet is unable to resolve it."

Correct 👍, DNS doesn't work for liveness probes, the kubelet network space cannot basically resolve any in-cluster DNS.
You can consider putting both of your services in a single pod as sidecars. This way they would share the same address space if one container fails then the whole pod is restarted.
Another option is to create an operator 🔧 for your pods/application and basically have it check the liveness through the in-cluster DNS for both pods separately and restart the pods through the Kubernetes API.
You can also just create your own script in a pod that just calls curl to check for a 200 OK and kubectl to restart your pod if you get something else.
Note that for the 2 options above you need to make sure that Coredns is stable and solid otherwise your health checks might fail to make your services have potential downtime.
✌️☮️

Related

Kubernetes Service unavailable when container crashes

In my Kubernetes cluster, I have a single pod (i.e. one replica) with two containers: server and cache.
I also have a Kubernetes Service that matches my pod.
If cache is crashing, when I try to send an HTTP request to server via my Service, I get a "503 Service Temporarily Unavailable".
The HTTP request is going into the cluster via Nginx Ingress, and I suspect that the problem is that when cache is crashing, Kubernetes removes my one pod from the Service load balancers, as promised in the Kubernetes documentation:
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
I don't prefer this behavior, since I still want to be able server to respond to requests even if cache has failed. Is there any way to get this desired behavior?

A POD is brought to the "Failed" state if one of the following conditions occur
One of its containers exit with non-zero status
Kubernates terminates a container due to health checker failing
So, if you need one of your containers to still respond when another one fails,
Make sure your liveliness probe is pointed to the container you need to be continuing. The health checker will get success code always and will not mark the POD as "Failed"
Make sure the readiness probe is pointed to the container you neesd to be continuing. This will make sure that the load balancer will still send the traffic to your pod.
Make sure that you handle the container errors gracefully and make them exit with zero status code.
In the following example readiness and liveliness probes, make sure that the port 8080 is handled by the service container and it has the /healthz and /ready routes active.
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 1

The behavior I am looking for is configurable on the Service itself via the publishNotReadyAddresses option:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#servicespec-v1-core

Why is a pod in my GKE cluster receiving repeated requests to the root path of web server?

I have a web server on port 8080. I have a Ingress object made by Google routing external traffic to a service that points to the pod.
My logs are littered with a request from 10.4.0.1:(some rotating large port) at "/" and it is eating CPU cycles as my web server generates the HTML to respond. It seems like a health check probe.
My pod has the following probe configs on my deployment:
readinessProbe:
httpGet:
path: "/status"
port: 8080
initialDelaySeconds: 10
livenessProbe:
httpGet:
path: "/status"
port: 8080
initialDelaySeconds:
Though it looks like I may have missed something in the configuration.
I've used tcpdump -X port 8080 to examine the traffic. It looks as though the same source (10.4.0.1) are conducting the status check (at "/status") and a mysterious check at root ("/") back to back. It seems as though it is the kubelet but I haven't found proof. The pod IP range is 10.4.0.0/14. It also seems as though the new configuration worked but the default probe config wasn't removed.
After applying changes to the deployment, do I need to purge and restart the service? Ingress? Node? I'm new to Kubernetes and am lost.
Help of any kind is greatly appreciated!

To resolve the issue I had to change the configuration on a VM Instance Health Check that is part of Google's Compute Engine API. Setting the path to "/status" seemed to do the trick. So in short there is a health check from both kubernetes and GCE.

Pod got CrashLoopBackOff in Kubernetes because of GCP service account

After deployment with using helm carts, I got CrashLoopBackOff error.
NAME READY STATUS RESTARTS AGE
myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m
Then, I describe the pod to see events and I saw smth like below
Liveness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Readiness probe failed: Get http://10.16.26.26:8080/status:
dial tcp 10.16.26.26:8080: connect: connection refused
Lastly, I saw invalid grant access to my GCP cloud proxy in logs as below
time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\n \"error\": \"invalid_grant\",\n \"error_description\": \"Not a valid email or user ID.\"\n}"
However, I checked my service account in IAM, it has access to cloud proxy. Furthermore, I tested with using same credentials in my local, and endpoint for readiness probe was working successfully.
Does anyone has any suggestion about my problem?

You can disable liveness probe to stop CrashLoopBackoff, exec into container and test from there.
Ideally you should not keep save config for liveness and readiness probe.It is not advisable for liveness probe to depend on anything external, it should just check if pod is live or not.

Referring to problem with granting access on GCP - fix this by using Email Address (the string that ends with ...#developer.gserviceaccount.com) instead of Client ID for client_id parameter value. The naming set by Google is confusing.
More information and troubleshooting you can find here: google-oautgh-grant.
Referring to problem with probes:
Check if URL is health. Your Probes may be too sensitive - your application take a while to start or respond.
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Liveness probe checks if your application is in a healthy state in your already running pod.
Readiness probe will actually check if your pod is ready to receive traffic. Thus, if there is no /path endpoint, it will never appear as Running
egg:
livenessProbe:
httpGet:
path: /your-path
port: 5000
failureThreshold: 1
periodSeconds: 2
initialDelaySeconds: 2
ports:
- name: http
containerPort: 5000
If endpoint /index2 will not exist pod will never appear as Running.
Make sure that you properly set up liveness and readiness probe.
For an HTTP probe, the kubelet sends an HTTP request to the specified
path and port to perform the check. The kubelet sends the probe to the
pod’s IP address, unless the address is overridden by the optional
host field in httpGet. If scheme field is set to HTTPS, the kubelet
sends an HTTPS request skipping the certificate verification. In most
scenarios, you do not want to set the host field. Here’s one scenario
where you would set it. Suppose the Container listens on 127.0.0.1
and the Pod’s hostNetwork field is true. Then host, under httpGet,
should be set to 127.0.0.1. Make sure you did it. If your pod relies
on virtual hosts, which is probably the more common case, you should
not use host, but rather set the Host header in httpHeaders.
For a TCP probe, the kubelet makes the probe connection at the node,
not in the pod, which means that you can not use a service name in the
host parameter since the kubelet is unable to resolve it.
Most important thing you need to configure when using liveness probes. This is the initialDelaySeconds setting.
Make sure that you do have port 80 open on the container.
Liveness probe failure causes the pod to restart. You need to make sure the probe doesn’t start until the app is ready. Otherwise, the app will constantly restart and never be ready!
I recommend to use p99 startup time for the initialDelaySeconds.
Take a look here: probes-kubernetes, most-common-fails-kubernetes-deployments.

Readiness-Probe another Service on boot-up of Pod

On my Kubernetes Setup, i have 2 Services - A and B.
Service B is dependent on Service A being fully started through.
I would now like to set a TCP Readiness-Probe in Pods of Service B, so they test if any Pod of Service A is fully operating.
the ReadinessProbe section of the deployment in Service B looks like:
readinessProbe:
tcpSocket:
host: serviceA.mynamespace.svc.cluster.local
port: 1101 # same port of Service A Readiness Check
I can apply these changes, but the Readiness Probe fails with:
Readiness probe failed: dial tcp: lookup serviceB.mynamespace.svc.cluster.local: no such host
I use the same hostname on other places (e.g. i pass it as ENV to the container) and it works and gets resolved.
Does anyone have an idea to get the readiness working for another service or to do some other kind of dependency-checking between services?
Thanks :)

Due to the fact that Readiness and Liveness probes are fully managed by kubelet node agent and kubelet inherits DNS discovery service from the particular Node configuration, you are not able to resolve K8s internal nameserver DNS records:
For a probe, the kubelet makes the probe connection at the node, not
in the pod, which means that you can not use a service name in the
host parameter since the kubelet is unable to resolve it.
You can consider scenario when your source Pod A consumes Node IP Address by propagating hostNetwork: true parameter, thus kubelet can reach and success Readiness probe from within Pod B, as described in the official k8s documentation:
tcpSocket:
host: Node Hostname or IP address where Pod A residing
port: 1101
However, I've found Stack thread, where you can get more efficient solution how to achieve the same result through Init Containers.

In addition to Nick_Kh's answer, another workaround is to use probe by command, which is executed in a container.
To perform a probe, the kubelet executes the command cat /tmp/healthy in the target container. If the command succeeds, it returns 0, and the kubelet considers the container to be alive and healthy.
An example:
readinessProbe:
exec:
command:
- sh
- -c
- wget -T2 -O- http://service

Accessing the deployed service using Helm chart in Kubernetes cluster

Currently, I am trying to deploy my microservice end point Docker image on a Kubernetes cluster by creating the Helm chart. For this, I created the chart and changed the parameters in values.yaml and deployment.yaml for port change. And also I want to access from my Angular front end. So I added service type= NodePort. And when I described the service, it gave me the port 30983 to access.
And I accessed like http://node-ip:30983/endpoint
But I am only getting the site can't be reached the message. Let me add the details of what I did here:
My values.yaml file containing the following to mention the service type:
And my templates/service.yaml file containing like the following:
And my templates/deployment.yaml file containing the following:
And I tried to access like the following:
http://192.168.16.177:30983/
And only getting site can't be reached.
NB: when I tried to describe the service, then I am getting the following:
The output of kubectl get pod --show-labels like the following image screenshot
Updated
And when we using kubectl describe pod command, getting like the following:
Updated Error
Readiness probe failed: HTTP probe failed with statuscode: 404
Liveness probe failed: HTTP probe failed with statuscode: 404
How can I access my endpoint from deployment?

Try this for healthcheck probes:
livenessProbe:
tcpSocket:
port: 8085
readinessProbe:
tcpSocket:
port: 8085

try the following command docker ps -a and find the container associated with the pod. The container name should be pretty much same as the pod name with some prefix/suffix.
then look at the logs using docker logs <container_id>. Maybe that will give you clues to what it is restarting

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse