Kubernetes Service not distributing the traffic evenly among pods - kubernetes

I am using Kubernetes v1.20.10 baremetal installation. It has one master node and 3 worker nodes. The application simply served HTTP requests.
I am scaling the deployment based on the (HPA) Horizontal Pod Autoscaler and I noticed that the load is not getting evenly across pods. Only the first pod is getting 95% of the load and the other Pod is getting very low load.
I tried the answer mentioned here but did not work : Kubernetes service does not distribute requests between pods

Based on the information provided I assume that you are using http-keepalive which is a persistent tcp connection.
A kubernetes service distributes load for each (new) tcp connection. If you have persistent connections, only the additional connections will be distributed which is the effect that you observe.
Try: Disable http keepalive or set the maximum keepalive time to something like 15 seconds, maximum requests to 50.

Related

Cover for unhealthy pods

I am running multiple pods with python/gunicorn serving web request. At times, request get really slow (up to 60s) which blocks all workers and makes the livenessProbe fail.
In some instances, all pods are blocked in this state and are restarted at the same time (graceful shutdown takes up to 60s). This means that no pod is available to take new requests.
Is there a way of telling k8s to cover for pods that it is restarting? For example starting a new pod when other pods are unhealthy.
You can have an ingress or a load balancer at L7 layer which can route traffic to kubernetes service which can have multiple backend pods(selected by labels of the pods and label selector of the service) which spread across different deployments running in different nodes. The ingress controller or loadbalancer can do health check on backends and stop routing traffic to unhealthy pods.This topology overall increases the availability and resiliency of the application.

latency based routing for service endpoints in kubernetes cluster

we have single kubernetes cluster which has worker nodes in multiple data-centres which are in different geography area.
we have a service endpoint which connect to the application pods which are in different data-centres. lets say application A has 2 pods running in Data-CentresY, 2 pods in Data-CentreZ and 2 pods in Data-CentreX. now when requests lands on a service endpoint it route traffic to all these 6 pods which are in different data-centres.
we want to implement a latency based routing for service endpoints where when requests lands on a workers node it should route traffic to its nearest pods or pod with low network latency.
any suggestion or guidance are much appreciated.
Use kube-proxy with ipvs mode and use sed - shortest expected delay
Refer: https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs

kube-apiserver high CPU and requests

We have a Kubernetes 1.7.8 clusters deployed with Kops 1.7 in HA with three masters. The cluster has 10 nodes and around 400 pods.
The cluster has heapster, prometheus, and ELK (collecting logs for some pods).
We are seeing a very high activity in the masters, over 90% of CPU used by the api-server.
Checking prometheus numbers we can see that near 5000 requests to the kube-apiserver are WATCH verbs, the rest are less than 50 request (GET, LIST, PATCH, PUT).
Almost all requests are reported with client "Go-Http-client/2.0" (the default User Agent for the Go HTTP library).
Is this a normal situation?
How can we debug which are the pods sending these requests? (How can we add the source IP to the kube-apiserver logs?)
[kube-apiserver.manifest][1]
Thanks,
Charles
[1]: https://pastebin.com/nGxSXuZb
Regarding the Kubernetes architecture this is a normal behavior because all kubernetes cluster components are calling the api-server to watch for changes.
That is why you have more than 5000 WATCH entries in your logs. Please take a look how the kubernetes cluster is managed by kube api server and how the master-node comunication is organized

How to kickoff the dead replicas of Kubernetes Deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)

Kubernetes NodePort routing logic

I have a kubernetes setup that contains 4 minions (node1,2,3,4). I created a service that exposes port 80 as node port of 30010. There are 4 nginx pods that accepts the traffic from above service. However distribution of pods among nodes may vary. For example node 1 has 2 pods, node 2 has 1 pod and node 3 has 1 pod. Node 4 doesn't have any pod deployed. My requirement is, whenever I send a request to node1:30010 it should hit only 2 pods on node 1 and it should not hit other pods. Traffic should be routed to other nodes if and only if there is no pod in local node. For example node4 may have to route requests to node4:30010 to other nodes because it has no suitable pod deployed on it. Can I facilitate this requirement by changing configurations of kube-proxy?
As far as I'm aware, no. Hitting node1:30010 will pass traffic to the service, the service will then round robin the response.
Kubernetes is designed as a layer of abstraction above nodes, so you don't have to worry about where traffic is being sent, trying to control which node traffic goes to goes against that idea.
Could you explain your end goal? If your different pods are serving different responses then you may want to create more services, or if you are worried about latency and want to serve traffic from the node closest to the user you may want to look at federating your cluster.