Kubernetes how to use round-robin(rr) load balancing strategy among pods - kubernetes

I made a deployment and scaled out to 2 replicas. And I made a service to forward it.
I found that kube-proxy uses iptables for forwarding from Service to Pod. But the load balancing strategy of iptables is RANDOM.
How can I force my service to forward requests to 2 pods using round-robin strategy without switching my kube-proxy to userspace or ipvs mode?

You cannot, the strategies are only supported in ipvs mode. The option is even called --ipvs-scheduler.

You cannot.
But if you really don't want to change --proxy-mode flag on kube-proxy you can use some third-party proxy/loadbalancer (like HAProxy) and point it to your application. But this is usually not the best option as you need to make sure it's deployed with HA and it will also add complexity to your deployment.

Related

How does Kubernetes balance requests among cluster's nodes?

I've been studying Kubernetes' main features for days, I understood many things, really I did. But I found nowhere the answer to this question: how does Kubernetes balance requests among cluster's nodes?
Well, I mean, suppose an on premise private Kubernetes cluster: LoadBalancer type actually makes a service publish his ports to the network with an IP; Ingress service is a service which sets the rules for some third-part IngressController, which handles requests and forward them to the correct service.
What I do not understand:
Does any or all of these components, or others perhaps, actually monitors nodes' (or pods', I don't know) available resources and chooses to which node (or pod) forward the requests?
If any real load balancer is present natively in Kubernates, what criteria does it adopt? Maybe the aforementioned resources, or the network latency, or just adopts a round robin.
If there is a default policy for balancing, is it possible to customize it and implement your own rules?
Please, tell me if I misunderstood anything and I'll try to focus better on that one. Thank you all.
If you don't have something in place that does load balancing externally (f.e. istio) all your mentioned options boil down to getting tcp connections into the cluster.
Inside the cluster a ClusterIP is the real concept for load balancing: All Pods that are assigned to a Service with a ClusterIP will be used (roughly) in a round robin fashion.
This is handled by iptables DNAT rules configured by kubeproxy on each node.
The external LoadBalancer or Ingress usually do not do load balancing, even if the name might suggest it.

k8S Ingress and IPVS

I am new to k8s, and I have a question regarding the use cases of ingress and IPVS.
According to what I have read in several articles on the internet, ingress is used for load balancing in north-south traffic toward pods. There are several ingress-solutions out there, like traefic-nginx-haproxy, etc.
Here comes my question, what is the use case of IPVS transport-layer load balancing?
Can we use it for the east-west traffic between pods..?
Please correct me if I have a misconception of the above.
Cheers
IPVS is layer 4 load balancing at linux kernel level.
i read somewhere it can handle around 100,000 forwarding requests per second.
Even though Kubernetes already support 5000 nodes in release v1.6, the
kube-proxy with iptables is actually a bottleneck to scale the cluster
to 5000 nodes. One example is that with NodePort Service in a
5000-node cluster, if we have 2000 services and each services have 10
pods, this will cause at least 20000 iptable records on each worker
node, and this can make the kernel pretty busy.
Example : https://blog.titanwolf.in/a?ID=00700-de778e7d-72e7-4515-b822-18844b104abd
https://dustinspecker.com/posts/ipvs-how-kubernetes-services-direct-traffic-to-pods/
Question
what is the use case of IPVS transport-layer load balancing?
You can use the IPVS with external IP to expose the service running inside the K8s cluster instead of ingress.
Can we use it for the east-west traffic between pods..?
Yes, you can use it. You can run the kube-proxy in to the IPVS mode.
So Kube proxy has three modes userspace, iptables, or IPVS
If i explain iptables VS IPVS in very simple words,
there is not much performance changes until you are running around 1000 services and 10000 PODs in the cluster. If you are operating at that level using the IPVS with Kube-proxy might can help you and improve performance.
If you aren’t sure whether IPVS will be a win for you then stick with kube-proxy in iptables mode. It’s had a ton more in-production hardening.
You can checkout this document for more : https://www.tigera.io/blog/comparing-kube-proxy-modes-iptables-or-ipvs/

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

using kube-proxy for load balancing

The official kubernetes docs clearly state that kube-proxy "will not scale to very large clusters with thousands of Services", however when a LoadBalancer type Service is created on GKE the externalTrafficPolicy is set to Cluster by default (meaning that each request will be load-balanced by kube-proxy anyway in addition to external load balancing). As it is explained for example in this video from Next '17, this is to avoid traffic imbalance (as Google's external load balancers are not capable of asking a cluster how many pods of a given service are on each node).
Hence the question: does it mean that:
a) by default GKE cannot be used for for "very large clusters with thousands of Services" and to do so I need to risk traffic imbalances by setting externalTrafficPolicy to Local
b) ...or the information about poor scalability of kube-proxy is incorrect or outdated
c) ...or something else that I couldn't come up with
Thanks!
will not scale to very large clusters with thousands of services quote refers to userspace proxy, which was the default mode long time ago before full iptables based implementation happened. So this statement is largely outdated, but...
iptables mode has it's own issues that come with scale (extreamly large iptables rule chains take a lot of time to update) which is one of the reasons why IPVS work made it into kube-proxy. You'd have to have a really hardcore scale to run into performance issues with kube-proxy.
According to the Kubernetes official documentation about externalTrafficPolicy the answer is a).
Since Cluster option obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading, and Local option preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading.

Which algorithm Kubernetes uses to nevigate traffics in replicaset/deployment

I was asked about it, and I couldn't find info about it online - Which algorithm Kubernetes uses to nevigate traffics in replicaset or deployment (I guess that they the same) for pods?
Lets say, I have replica of 5 pods in my Kubernetes cluster, defined in replicaset. How does the cluster pick which pod to go to, in new request? Is it uses round-robin? I couldn't find info about it.
The algorithm applied to determine which pod will process the request depends on kube-proxy mode that is running.
In 1.0, the proxy works in mode called userspace and default algorithm is round robin.
In 1.2 mode iptables proxy was added, but still round robin is used due to iptables limitations.
In 1.8.0-beta, IP Virtual Server (IPVS) was introduced, it allow much more algorithms options, like:
RoundRobin;
WeightedRoundRobin;
LeastConnection;
WeightedLeastConnection;
LocalityBasedLeastConnection;
LocalityBasedLeastConnectionWithReplication;
SourceHashing;
DestinationHashing;
ShortestExpectedDelay;
NeverQueue.
References:
https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
https://sookocheff.com/post/kubernetes/understanding-kubernetes-networking-model/