How does Kubernetes balance requests among cluster's nodes? - kubernetes

I've been studying Kubernetes' main features for days, I understood many things, really I did. But I found nowhere the answer to this question: how does Kubernetes balance requests among cluster's nodes?
Well, I mean, suppose an on premise private Kubernetes cluster: LoadBalancer type actually makes a service publish his ports to the network with an IP; Ingress service is a service which sets the rules for some third-part IngressController, which handles requests and forward them to the correct service.
What I do not understand:
Does any or all of these components, or others perhaps, actually monitors nodes' (or pods', I don't know) available resources and chooses to which node (or pod) forward the requests?
If any real load balancer is present natively in Kubernates, what criteria does it adopt? Maybe the aforementioned resources, or the network latency, or just adopts a round robin.
If there is a default policy for balancing, is it possible to customize it and implement your own rules?
Please, tell me if I misunderstood anything and I'll try to focus better on that one. Thank you all.

If you don't have something in place that does load balancing externally (f.e. istio) all your mentioned options boil down to getting tcp connections into the cluster.
Inside the cluster a ClusterIP is the real concept for load balancing: All Pods that are assigned to a Service with a ClusterIP will be used (roughly) in a round robin fashion.
This is handled by iptables DNAT rules configured by kubeproxy on each node.
The external LoadBalancer or Ingress usually do not do load balancing, even if the name might suggest it.

Related

Is Kubernetes Networking Smart Enough to Route ClusterIP Requests to Local Pod

I think the question has most of the information. If I have a Service that is a ClusterIP Service, and a Pod accesses that Service, since the load balancing is built into the routing table, are the routing rules smart enough to prefer traffic to the local Node? Is there a way to configure it so that the traffic does prefer (or even be forced) to go to the local Node?
Reasoning - I'm thinking about running a caching server as a DaemonSet, which makes sense if traffic is likely to go to a local Pod.
internalTrafficPolicy is probably what you are looking for.

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

Performance considerations for NodePort vs. ClusterIP vs. Headless Service on Kubernetes

We have two types of services that we run on AWS EKS:
external-facing services which we expose through an application-level load balancer using aws-alb-ingress-controller
internal-facing services which we use both directly through the service name (for EKS applications) and through an internal application-level loadbalancer also using aws-alb-ingress-controller (for non-EKS applications)
I would like to understand the performance implications of choosing Nodeport, ClusterIP or Headless Service for both the external and internal services. I have the setup working with all three options.
If I understanding the networking correctly, it seems that a Headless Service requires less hops and would hence be (slightly) faster? This article however seems to suggest that a Headless Service would not be properly load balanced when called directly. Is this correct? And would this still hold when called through the external (or internal) ALB?
Is there any difference in performance for NodePort vs ClusterIP?
Finally, what is the most elegant/performant way of using internal services from outside of the cluster (where we don't have access to the Kubernetes DNS) but within the same VPC? Would it be to use ClusterIp and specify the IP address in the service definition so it remains stable? Or are there better options?
I've put more detailed info on the each of the connection forwarding types and how the services are forwarded down under the headings belowfor context to my answers.
If I understanding the networking correctly, it seems that a Headless Service requires less hops and would hence be (slightly) faster?
Not substantially faster. The "extra hop" is the packet traversing local lookup tables which it traverses anyway so not a noticeable difference. The destination pod is still going to be the same number of actual network hops away.
If you have 1000's of services that run on a single pod and could be headless then you might use that to limit the number of iptables NAT rules and speed rule processing up (see iptables v ipvs below).
Is < a headless service not load balanced > correct? And would this still hold when called through the external (or internal) ALB?
Yes it is correct, the client (or ALB) would need to implement the load balancing across the Pod IP's.
Is there any difference in performance for NodePort vs ClusterIP?
A NodePort has a possible extra network hop from the entry node to the node running the pod. Assuming the ClusterIP ranges are routed to the correct node (and routed at all)
If you happen to be using a service type: LoadBalancer this behaviour can change by setting [.spec.externalTrafficPolicy to Local][https://kubernetes.io/docs/concepts/services-networking/service/#aws-nlb-support] which means traffic will only be directed to a local pod.
Finally, what is the most elegant/performant way of using internal services from outside of the cluster
I would say use the AWS ALB Ingress Controller with the alb.ingress.kubernetes.io/target-type: ip annotation. The k8s config from the cluster will be pushed out to the ALB via the ingress controller and address pods directly without traversing any connection forwarding or extra hops. All cluster reconfig will be automatically pushed out.
There is a little bit of latency for config to get to the ALB compared to cluster kube-proxy reconfiguration. Something like a rolling deployment might not be as seamless as the updates arrive after a pod is gone. The ALB's are equipped to handle the outage themselves, eventually.
Kubernetes Connection Forwarding
There is a kube-proxy process running on each node which manages how and where connections are forwared. There are 3 options for how kube-proxy does that: Userspace proxy, iptables or IPVS. Most clusters will be on iptables and that will cater for the vast majority of use cases.
Userspace proxy
The forwarding is via a process that runs in userspace to terminate and forward the connections. It's slow. It's unlikely you are using it, don't use it.
iptables
iptables forwards connections in kernel via NAT, which is fast. This is most common setup and will cover 90% of use cases. New connections are shared evenly between all nodes running pods for a service.
IPVS
Runs in kernel, it is fast and scalable. If you shift a traffic to a large number of apps this might improve the forwarding performance. It also supports different service load balancing modes:
- rr: round-robin
- lc: least connection (smallest number of open connections)
- dh: destination hashing
- sh: source hashing
- sed: shortest expected delay
- nq: never queue
Access to services
My explanations are iptables based as I haven't done much detailed work with ipvs clusters yet. I'm gonna handwave the ipvs complexity away and say it's basically the same as iptables, just with faster rule processing as the number of rules increases on huge clusters (i.e number of pods/services/network policies).
I'm also ignoring the userspace proxy in the description, due to the overhead just don't use it.
The basic thing to understand is a "Service ClusterIP" is a virtual construct in the cluster that only exists as rule for where the traffic should go. Every node maintains this rule mapping of all ClusterIP/port to PodIP/port (via kube-proxy)
Nodeport
ALB routes to any node, The node/nodeport forwards the connection to a pod handling the service. This could be a remote pod which would involve sending traffic back out over the "wire".
ALB > wire > Node > Kernel Forward to SVC ( > wire if remote node ) > Pod
ClusterIP
Using the ClusterIP for direct access depends on the Service cluster IP ranges being routed to the correct node. Sometimes they aren't routed at all.
ALB > wire > Node > Kernel Forward to SVC > Pod
The "Kernel Forward to SVC" step can be skipped with an ALB annotation without using a headless service.
Headless Service
Again, Pod IP's aren't always addressable from outside the cluster depending on the network setup. You should be fine on EKS.
ALB > wire > Node > Pod
Note
I'll suffix this with requests are probably looking at < 1ms of additional latency if a connection is forwarded to a node in a VPC. Enhanced networking instances at the low end of that. Inter availability-zone comms might be a tad higher than intra-AZ. If you happened to have a geographically separated cluster it might increase the importance of controlling traffic flow. For example having a tunnelled calico network that actually jumped over a number of real networks.
what is the most elegant/performant way of using internal services from outside of the cluster (where we don't have access to the Kubernetes DNS) but within the same VPC?
For this to achieve, I think you should have a look at a Service Mesh. For example, Istio(https://istio.io). It handles your internal service calls manually so that the call doesn't have to go through Kubernetes DNS. Please have a look at Istio's docs (https://istio.io/docs) for more info.
Also, you can have a look at Istio at EKS (https://aws.amazon.com/blogs/opensource/getting-started-istio-eks)
Headless service will not have any load balancing at L4 layer but if you use it behind an ALB you are getting load balancing at L7 layer.
Nodeport internally uses cluster IP but because your request may randomly be routed to a pod on another host when it could have been routed to a pod on the same host, avoiding that extra hop out to the network. Nodeport is generally a bad idea for production usage.
IMHO best way to access internal services from outside of the cluster will be using ingress.
You can use nginx as ingress controller where you deploy the nginx ingress controller on your cluster and expose it via a LoadBalancer type service using ALB. Then you can configure path or host based routing using ingress api to route traffic between backend kubernetes services.

Best way to go between private on-premises network and kubernetes

I have setup an on-premises Kubernetes cluster, and I want to be ensure that my services that are not in Kubernetes, but exist on a separate class B are able to consume those services that have migrated to Kubernetes. There's a number of ways of doing this by all accounts and I'm looking for the simplest one.
Ingress + controller seems to be the one favoured - and it's interesting because of the virtual hosts and HAProxy implementation. But where I'm getting confused is how to set up the Kubernetes service:
We've not a great deal of choice - ClusterIP won't be sufficient to expose it to the outside, or NodePort. LoadBalancer seems to be a simpler, cut down way of switching between network zones - and although there are OnPrem solutions (metalLB), seems to be far geared towards cloud solutions.
But if I stick with NodePort, then my entry into the network is going to be on a non-standard port number, and I would prefer it to be over standard port; particuarly if running a percentage of traffic for that service over non-kube, and the rest over kubernetes (for testing purposes, I'd like to monitor the traffic over a period of time before I bite the bullet and move 100% of traffic for the given microservice to kubernetes). In that case it would be better those services would be available across the same port (almost always 80 because they're standard REST micro-services). More than that, if I have to re-create the service for whatever reason, I'm pretty sure the port will change, and then all traffic will not be able to enter the Kubernetes cluster and that's a frightening proposition.
What are the suggested ways of handling communication between existing on-prem and Kubernetes cluster (also on prem, different IP/subnet)?
Is there anyway to get traffic coming in without changing the network parameters (class B's the respective networks are on), and not being forced to use NodePort?
NodePort service type may be good at stage or dev environments. But i recommend you to go with LoadBalancer type service (Nginx ingress controller is one). The advantage for this over other service types are
You can use standard port (Rather random Nodeport generated by your kubernetes).
Your service is load balanced. (Load balancing will be taken care by ingress controller).
Fixed port (it will not change unless you modify something in ingress object).

Where do services live in Kubernetes?

I am learning Kubernetes and currently deep diving into high availability and while I understand that I can set up a highly available control plane (API-server, controllers, scheduler) with local (or with remote) etcds as well as a highly available set of minions (through Kubernetes itself), I am still not sure where in this concept services are located.
If they live in the control plane: Good I can set them up to be highly available.
If they live on a certain node: Ok, but what happens if the node goes down or becomes unavailable in any other way?
As I understand it, services are needed to expose my pods to the internet as well as for loadbalancing. So no HA service, I risk that my application won't be reachable (even though it might be super highly available for any other aspect of the system).
Kubernetes Service is another REST Object in the k8s Cluster. There are following types are services. Each one of them serves a different purpose in the cluster.
ClusterIP
NodePort
LoadBalancer
Headless
fundamental Purpose of Services
Providing a single point of gateway to the pods
Load balancing the pods
Inter Pods communication
Provide Stability as pods can die and restart with different Ip
more
These Objects are stored in etcd as it is the single source of truth in the cluster.
Kube-proxy is the responsible for creating these objects. It uses selectors and labels.
For instance, each pod object has labels therefore service object has selectors to match these labels. Furthermore, Each Pod has endpoints, so basically kube-proxy assign these endpoints (IP:Port) with service (IP:Port).Kube-proxy use IP-Tables rules to do this magic.
Kube-Proxy is deployed as DaemonSet in each cluster nodes so they are aware of each other by using etcd.
You can think of a service as an internal (and in some cases external) loadbalancer. The definition is stored in Kubernetes API server, yet the fact thayt it exists there means nothing if something does not implement it. Most common component that works with services is kube-proxy that implements services on nodes using iptables (meaning that every node has every service implemented in it's local iptables rules), but there are also ie. Ingress Controller implementations that use Service concept from API to find endpoints and direct traffic to them, effectively skipping iptables implementation. Finaly there are service mesh solutions like linkerd or istio that can leverage Service definitions on their own.
Services loadbalance between pods in most of implementations, meaning that as long as you have one backing pod alive (and with enough capacity) your "service" will respond (so you get HA as well, specially if you implement readiness/liveness probes that among other things will remove unhealthy pods from services)
Kubernetes Service documentation provides pretty good insight on that