debug gke loadbalancing error - some backend services are in UNHEALTHY state - kubernetes

Started seeing this error for the first time in a year and unsure how to debug (not very familiar with k8s)
{#type: type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry, statusDetails: failed_to_pick_backend}

failed_to_pick_backend - The load balancer failed to pick a healthy backend to handle the request.
Debug tips:
List the pods the load balancer pointing to.
Make sure that probes (ie. readiness, liveness) are configured.
Describe the pods ( kubectl describe pods <pod_name> -n <namespace>) to see why the health check is failing.
Fix the health check problem. Once the pods are healthy, give the load balancer some time (sometimes it takes hours) to update the status.

Related

How to solve "Ingress Error: Some backend services are in UNHEALTHY state"?

I am working on deploying a certain pod to GKE but I am having an unhealthy state for my backend services.
The deployment went through via helm install process but the ingress reports a certain warning error that says Some backend services are in UNHEALTHY state. I have tried to access the logs but do not know exactly what to look out for. Also, I already have liveness and readiness probes running.
What could I do to make the ingress come back to a healthy state? Thanks
Picture of warning error on GKE UI
Without more details it is hard to determine the exact cause.
As first point I want to mention, that your error message is Some backend services are in UNHEALTHY state, not All backend services are in UNHEALTHY state. It indicates that only a few of your backends are affected.
There might be tons of reasons, if you are using GCP Ingress or Nginx Ingress, your configuration of externalTrafficPolicy, if you are using preemptive nodes, your livenessProbe and readinessProbe, health checks, etc.
In your scenario, only a few backends are affected, the only thing with current information I can suggest you some debug options.
Using $ kubectl get po -n <namespace> check if all your pods are working correctly, that all containers within pods are Ready and pod status is Running. Eventually check logs of suspicious pod $ kubectl logs <podname> -c <containerName>. In general you should check all pods the load balancer is pointing to,
Confirm if livenessProbe and readinessProbe are configured properly and response is 200,
Describe your ingress $ kubectl describe ingress <yourIngressName> and check backends,
Check if you've configured your health checks properly according to GKE Ingress for HTTP(S) Load Balancing - Health Checks guide.
If you still won't be able to solve this issue with above debug options, please provide more details about your env with logs (without private information).
Useful links:
kubernetes unhealthy ingress backend
GKE Ingress shows unhealthy backend services
In GKE you can define BackendConfig. To define custom health checks. you can configure this using the below link to make the ingress backend to be in a HEALTHY state.
https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health
If you have kubectl access to your pods, you can run kubectl get pod, and then kubctl logs -f <pod-name>.
Review the logs and find the error(s).

Kubernetes scaling pods by number of active connections

I have a kubernetes cluster that runs some legacy containers (windows containers) .
To simplify , let's say that the container can handle max 5 requests at a time something like
handleRequest(){
requestLock(semaphore_Of_5)
sleep(2s)
return "result"
}
So the cpu is not spiked . I need to scale based on nr of active connections
I can see from the documentation https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
You can use Pod readiness probes to verify that backend Pods are working OK, so that kube-proxy in iptables mode only sees backends that test out as healthy. Doing this means you avoid having traffic sent via kube-proxy to a Pod that’s known to have failed.
So there is a mechanism to make pods available for routing new requests but it is the livenessProbe that actually mark the pod as unhealthy and subject to restart policy. But my pods are just busy. They don't need restarting.
How can I increase the nr of pods in this case ?
You can enable HPA for the deployment.
You can autoscale on the no of requests metrics and perform autoscaling on this metric.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects
I would also recommend to configure liveness probe failureThreshold and timeoutSeconds, check if it helps.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes

Load Balancing between PODS

Is there a way to do active and passive load balancing between 2 PODs of a micro-service. Say I have 2 instance(PODs) running of Micro-service, which is exposed using a K8s service object. Is there a way to configure the load balancing such a way that one pod will always get the request and when that pod is down , the other pod will start receiving the request?
I have ingress object also on top of that service.
This is what the Kubernetes Service object does, which you already mentioned you are using. Make sure you set up a readiness probe in your pod template so that the system can tell when your app is healthy.

How to kickoff the dead replicas of Kubernetes Deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)

Kubernetes - which pod receives request from load balancer?

I have a load balancer service for a deployment having 3 pods. When I do a rolling udpate(changing the image) by the following command :
kubectl set image deployment/< deployment name > contname=< image-name >
and hit the service continuously, it gives a few connection refused in between. I want to check which pods it is related to. In other words, is it possible to see which request is served by which pods (without going inside the pods and checking the logs in them)? Also, Is this because of a race condition, as in when a pod might have got a request and had just been terminated before receiving that(almost simultaneously - resulting in no response)?
Have you configured liveness and readiness probes for you Pods? The service will not serve traffic to a Pod unless it thinks it is healthy, but without health checks it won't know for certain if it is ready.