Kubernetes - which pod receives request from load balancer? - kubernetes

I have a load balancer service for a deployment having 3 pods. When I do a rolling udpate(changing the image) by the following command :
kubectl set image deployment/< deployment name > contname=< image-name >
and hit the service continuously, it gives a few connection refused in between. I want to check which pods it is related to. In other words, is it possible to see which request is served by which pods (without going inside the pods and checking the logs in them)? Also, Is this because of a race condition, as in when a pod might have got a request and had just been terminated before receiving that(almost simultaneously - resulting in no response)?

Have you configured liveness and readiness probes for you Pods? The service will not serve traffic to a Pod unless it thinks it is healthy, but without health checks it won't know for certain if it is ready.

Related

How to solve "Ingress Error: Some backend services are in UNHEALTHY state"?

I am working on deploying a certain pod to GKE but I am having an unhealthy state for my backend services.
The deployment went through via helm install process but the ingress reports a certain warning error that says Some backend services are in UNHEALTHY state. I have tried to access the logs but do not know exactly what to look out for. Also, I already have liveness and readiness probes running.
What could I do to make the ingress come back to a healthy state? Thanks
Picture of warning error on GKE UI
Without more details it is hard to determine the exact cause.
As first point I want to mention, that your error message is Some backend services are in UNHEALTHY state, not All backend services are in UNHEALTHY state. It indicates that only a few of your backends are affected.
There might be tons of reasons, if you are using GCP Ingress or Nginx Ingress, your configuration of externalTrafficPolicy, if you are using preemptive nodes, your livenessProbe and readinessProbe, health checks, etc.
In your scenario, only a few backends are affected, the only thing with current information I can suggest you some debug options.
Using $ kubectl get po -n <namespace> check if all your pods are working correctly, that all containers within pods are Ready and pod status is Running. Eventually check logs of suspicious pod $ kubectl logs <podname> -c <containerName>. In general you should check all pods the load balancer is pointing to,
Confirm if livenessProbe and readinessProbe are configured properly and response is 200,
Describe your ingress $ kubectl describe ingress <yourIngressName> and check backends,
Check if you've configured your health checks properly according to GKE Ingress for HTTP(S) Load Balancing - Health Checks guide.
If you still won't be able to solve this issue with above debug options, please provide more details about your env with logs (without private information).
Useful links:
kubernetes unhealthy ingress backend
GKE Ingress shows unhealthy backend services
In GKE you can define BackendConfig. To define custom health checks. you can configure this using the below link to make the ingress backend to be in a HEALTHY state.
https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health
If you have kubectl access to your pods, you can run kubectl get pod, and then kubctl logs -f <pod-name>.
Review the logs and find the error(s).

debug gke loadbalancing error - some backend services are in UNHEALTHY state

Started seeing this error for the first time in a year and unsure how to debug (not very familiar with k8s)
{#type: type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry, statusDetails: failed_to_pick_backend}
failed_to_pick_backend - The load balancer failed to pick a healthy backend to handle the request.
Debug tips:
List the pods the load balancer pointing to.
Make sure that probes (ie. readiness, liveness) are configured.
Describe the pods ( kubectl describe pods <pod_name> -n <namespace>) to see why the health check is failing.
Fix the health check problem. Once the pods are healthy, give the load balancer some time (sometimes it takes hours) to update the status.

HaProxy Ingress Controller - what is the process of add a pod?

On a Kubernetes cluster when using HaProxy as an ingress controller. How will the HaProxy add a new pod when the old pod has died.
Does it can make sure that the pod is ready to get traffic into.
Right now I am using a readiness probe and liveness probe. I know that the order in Kubernetes to use a new pod would be first Liveness probe --> Readiness probe --> 6/6 --> pod is ready.
So will it use the same Kubernetes mechanism using HaProxy Ingress Controller ?
Short answer is: Yes, it is!
From documentation:
The most demanding part is syncing the status of pods, since the environment is highly dynamic and pods can be created or destroyed at any time. The controller feeds those changes directly to HAProxy via the HAProxy Data Plane API, which reloads HAProxy as needed.
HAProxy ingress don't take care of the pod healthy, it is responsible to receive the external traffic and forward for the correct kubernetes services.
Kubelet uses liveness and probes to know when to restart a container, it means that you must define liveness, readiness in pod definition.
See more about container probes in pod lifecycle documentation.
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

Load Balancing between PODS

Is there a way to do active and passive load balancing between 2 PODs of a micro-service. Say I have 2 instance(PODs) running of Micro-service, which is exposed using a K8s service object. Is there a way to configure the load balancing such a way that one pod will always get the request and when that pod is down , the other pod will start receiving the request?
I have ingress object also on top of that service.
This is what the Kubernetes Service object does, which you already mentioned you are using. Make sure you set up a readiness probe in your pod template so that the system can tell when your app is healthy.

How to kickoff the dead replicas of Kubernetes Deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)