k8S Ingress and IPVS

k8S Ingress and IPVS - kubernetes

I am new to k8s, and I have a question regarding the use cases of ingress and IPVS.
According to what I have read in several articles on the internet, ingress is used for load balancing in north-south traffic toward pods. There are several ingress-solutions out there, like traefic-nginx-haproxy, etc.
Here comes my question, what is the use case of IPVS transport-layer load balancing?
Can we use it for the east-west traffic between pods..?
Please correct me if I have a misconception of the above.
Cheers

IPVS is layer 4 load balancing at linux kernel level.
i read somewhere it can handle around 100,000 forwarding requests per second.
Even though Kubernetes already support 5000 nodes in release v1.6, the
kube-proxy with iptables is actually a bottleneck to scale the cluster
to 5000 nodes. One example is that with NodePort Service in a
5000-node cluster, if we have 2000 services and each services have 10
pods, this will cause at least 20000 iptable records on each worker
node, and this can make the kernel pretty busy.
Example : https://blog.titanwolf.in/a?ID=00700-de778e7d-72e7-4515-b822-18844b104abd
https://dustinspecker.com/posts/ipvs-how-kubernetes-services-direct-traffic-to-pods/
Question
what is the use case of IPVS transport-layer load balancing?
You can use the IPVS with external IP to expose the service running inside the K8s cluster instead of ingress.
Can we use it for the east-west traffic between pods..?
Yes, you can use it. You can run the kube-proxy in to the IPVS mode.
So Kube proxy has three modes userspace, iptables, or IPVS
If i explain iptables VS IPVS in very simple words,
there is not much performance changes until you are running around 1000 services and 10000 PODs in the cluster. If you are operating at that level using the IPVS with Kube-proxy might can help you and improve performance.
If you aren’t sure whether IPVS will be a win for you then stick with kube-proxy in iptables mode. It’s had a ton more in-production hardening.
You can checkout this document for more : https://www.tigera.io/blog/comparing-kube-proxy-modes-iptables-or-ipvs/

Related

Kubernetes services : request Assignment algorithm

What is the logic algorithm that a kubernets service uses to assign requests to pods that it exposes? Can this algorithm be customized?
Thanks.

kube-proxy in userspace mode chooses a backend via a round-robin algorithm.
kube-proxy in iptables mode chooses a backend at random.
IPVS provides more options for balancing traffic to backend Pods; these are:rr: round-robin,lc: least connection (smallest number of open connections),dh: destination hashing,sh: source hashing,sed: shortest expected delay, nq: never queue
As mentioned here:- Service
For application level routing you would need to use a service mesh like istio ,envoy, kong.

You can use a component kube-proxy. What is it?
kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.
But why use a proxy when there is a round-robin DNS algorithm? There are a few reasons for using proxying for Services:
There is a long history of DNS implementations not respecting record TTLs, and caching the results of name lookups after they should have expired.
Some apps do DNS lookups only once and cache the results indefinitely.
Even if apps and libraries did proper re-resolution, the low or zero TTLs on the DNS records could impose a high load on DNS that then becomes difficult to manage.
kube-proxy has many modes:
User space proxy mode - In the userspace mode, the iptables rule forwards to a local port where a go binary (kube-proxy) is listening for connections. The binary (running in userspace) terminates the connection, establishes a new connection to a backend for the service, and then forwards requests to the backend and responses back to the local process. An advantage of the userspace mode is that because the connections are created from an application, if the connection is refused, the application can retry to a different backend
Iptables proxy mode - In iptables mode, the iptables rules are installed to directly forward packets that are destined for a service to a backend for the service. This is more efficient than moving the packets from the kernel to kube-proxy and then back to the kernel so it results in higher throughput and better tail latency. The main downside is that it is more difficult to debug, because instead of a local binary that writes a log to /var/log/kube-proxy you have to inspect logs from the kernel processing iptables rules.
IPVS proxy mode - IPVS is a Linux kernel feature that is specifically designed for load balancing. In IPVS mode, kube-proxy programs the IPVS load balancer instead of using iptables. This works, it also uses a mature kernel feature and IPVS is designed for load balancing lots of services; it has an optimized API and an optimized look-up routine rather than a list of sequential rules.
You can read more here - good question about proxy mode on StackOverflow, here - comparing proxy modes and here - good article about proxy modes.
Like rohatgisanat mentioned in his answer you can also use service mesh. Here is also good article about Kubernetes service mesh comparsion.

Kubernetes how to use round-robin(rr) load balancing strategy among pods

I made a deployment and scaled out to 2 replicas. And I made a service to forward it.
I found that kube-proxy uses iptables for forwarding from Service to Pod. But the load balancing strategy of iptables is RANDOM.
How can I force my service to forward requests to 2 pods using round-robin strategy without switching my kube-proxy to userspace or ipvs mode?

You cannot, the strategies are only supported in ipvs mode. The option is even called --ipvs-scheduler.

You cannot.
But if you really don't want to change --proxy-mode flag on kube-proxy you can use some third-party proxy/loadbalancer (like HAProxy) and point it to your application. But this is usually not the best option as you need to make sure it's deployed with HA and it will also add complexity to your deployment.

using kube-proxy for load balancing

The official kubernetes docs clearly state that kube-proxy "will not scale to very large clusters with thousands of Services", however when a LoadBalancer type Service is created on GKE the externalTrafficPolicy is set to Cluster by default (meaning that each request will be load-balanced by kube-proxy anyway in addition to external load balancing). As it is explained for example in this video from Next '17, this is to avoid traffic imbalance (as Google's external load balancers are not capable of asking a cluster how many pods of a given service are on each node).
Hence the question: does it mean that:
a) by default GKE cannot be used for for "very large clusters with thousands of Services" and to do so I need to risk traffic imbalances by setting externalTrafficPolicy to Local
b) ...or the information about poor scalability of kube-proxy is incorrect or outdated
c) ...or something else that I couldn't come up with
Thanks!

will not scale to very large clusters with thousands of services quote refers to userspace proxy, which was the default mode long time ago before full iptables based implementation happened. So this statement is largely outdated, but...
iptables mode has it's own issues that come with scale (extreamly large iptables rule chains take a lot of time to update) which is one of the reasons why IPVS work made it into kube-proxy. You'd have to have a really hardcore scale to run into performance issues with kube-proxy.

According to the Kubernetes official documentation about externalTrafficPolicy the answer is a).
Since Cluster option obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading, and Local option preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading.

Performance considerations for NodePort vs. ClusterIP vs. Headless Service on Kubernetes

We have two types of services that we run on AWS EKS:
external-facing services which we expose through an application-level load balancer using aws-alb-ingress-controller
internal-facing services which we use both directly through the service name (for EKS applications) and through an internal application-level loadbalancer also using aws-alb-ingress-controller (for non-EKS applications)
I would like to understand the performance implications of choosing Nodeport, ClusterIP or Headless Service for both the external and internal services. I have the setup working with all three options.
If I understanding the networking correctly, it seems that a Headless Service requires less hops and would hence be (slightly) faster? This article however seems to suggest that a Headless Service would not be properly load balanced when called directly. Is this correct? And would this still hold when called through the external (or internal) ALB?
Is there any difference in performance for NodePort vs ClusterIP?
Finally, what is the most elegant/performant way of using internal services from outside of the cluster (where we don't have access to the Kubernetes DNS) but within the same VPC? Would it be to use ClusterIp and specify the IP address in the service definition so it remains stable? Or are there better options?

I've put more detailed info on the each of the connection forwarding types and how the services are forwarded down under the headings belowfor context to my answers.
If I understanding the networking correctly, it seems that a Headless Service requires less hops and would hence be (slightly) faster?
Not substantially faster. The "extra hop" is the packet traversing local lookup tables which it traverses anyway so not a noticeable difference. The destination pod is still going to be the same number of actual network hops away.
If you have 1000's of services that run on a single pod and could be headless then you might use that to limit the number of iptables NAT rules and speed rule processing up (see iptables v ipvs below).
Is < a headless service not load balanced > correct? And would this still hold when called through the external (or internal) ALB?
Yes it is correct, the client (or ALB) would need to implement the load balancing across the Pod IP's.
Is there any difference in performance for NodePort vs ClusterIP?
A NodePort has a possible extra network hop from the entry node to the node running the pod. Assuming the ClusterIP ranges are routed to the correct node (and routed at all)
If you happen to be using a service type: LoadBalancer this behaviour can change by setting [.spec.externalTrafficPolicy to Local][https://kubernetes.io/docs/concepts/services-networking/service/#aws-nlb-support] which means traffic will only be directed to a local pod.
Finally, what is the most elegant/performant way of using internal services from outside of the cluster
I would say use the AWS ALB Ingress Controller with the alb.ingress.kubernetes.io/target-type: ip annotation. The k8s config from the cluster will be pushed out to the ALB via the ingress controller and address pods directly without traversing any connection forwarding or extra hops. All cluster reconfig will be automatically pushed out.
There is a little bit of latency for config to get to the ALB compared to cluster kube-proxy reconfiguration. Something like a rolling deployment might not be as seamless as the updates arrive after a pod is gone. The ALB's are equipped to handle the outage themselves, eventually.
Kubernetes Connection Forwarding
There is a kube-proxy process running on each node which manages how and where connections are forwared. There are 3 options for how kube-proxy does that: Userspace proxy, iptables or IPVS. Most clusters will be on iptables and that will cater for the vast majority of use cases.
Userspace proxy
The forwarding is via a process that runs in userspace to terminate and forward the connections. It's slow. It's unlikely you are using it, don't use it.
iptables
iptables forwards connections in kernel via NAT, which is fast. This is most common setup and will cover 90% of use cases. New connections are shared evenly between all nodes running pods for a service.
IPVS
Runs in kernel, it is fast and scalable. If you shift a traffic to a large number of apps this might improve the forwarding performance. It also supports different service load balancing modes:
- rr: round-robin
- lc: least connection (smallest number of open connections)
- dh: destination hashing
- sh: source hashing
- sed: shortest expected delay
- nq: never queue
Access to services
My explanations are iptables based as I haven't done much detailed work with ipvs clusters yet. I'm gonna handwave the ipvs complexity away and say it's basically the same as iptables, just with faster rule processing as the number of rules increases on huge clusters (i.e number of pods/services/network policies).
I'm also ignoring the userspace proxy in the description, due to the overhead just don't use it.
The basic thing to understand is a "Service ClusterIP" is a virtual construct in the cluster that only exists as rule for where the traffic should go. Every node maintains this rule mapping of all ClusterIP/port to PodIP/port (via kube-proxy)
Nodeport
ALB routes to any node, The node/nodeport forwards the connection to a pod handling the service. This could be a remote pod which would involve sending traffic back out over the "wire".
ALB > wire > Node > Kernel Forward to SVC ( > wire if remote node ) > Pod
ClusterIP
Using the ClusterIP for direct access depends on the Service cluster IP ranges being routed to the correct node. Sometimes they aren't routed at all.
ALB > wire > Node > Kernel Forward to SVC > Pod
The "Kernel Forward to SVC" step can be skipped with an ALB annotation without using a headless service.
Headless Service
Again, Pod IP's aren't always addressable from outside the cluster depending on the network setup. You should be fine on EKS.
ALB > wire > Node > Pod
Note
I'll suffix this with requests are probably looking at < 1ms of additional latency if a connection is forwarded to a node in a VPC. Enhanced networking instances at the low end of that. Inter availability-zone comms might be a tad higher than intra-AZ. If you happened to have a geographically separated cluster it might increase the importance of controlling traffic flow. For example having a tunnelled calico network that actually jumped over a number of real networks.

what is the most elegant/performant way of using internal services from outside of the cluster (where we don't have access to the Kubernetes DNS) but within the same VPC?
For this to achieve, I think you should have a look at a Service Mesh. For example, Istio(https://istio.io). It handles your internal service calls manually so that the call doesn't have to go through Kubernetes DNS. Please have a look at Istio's docs (https://istio.io/docs) for more info.
Also, you can have a look at Istio at EKS (https://aws.amazon.com/blogs/opensource/getting-started-istio-eks)

Headless service will not have any load balancing at L4 layer but if you use it behind an ALB you are getting load balancing at L7 layer.
Nodeport internally uses cluster IP but because your request may randomly be routed to a pod on another host when it could have been routed to a pod on the same host, avoiding that extra hop out to the network. Nodeport is generally a bad idea for production usage.
IMHO best way to access internal services from outside of the cluster will be using ingress.
You can use nginx as ingress controller where you deploy the nginx ingress controller on your cluster and expose it via a LoadBalancer type service using ALB. Then you can configure path or host based routing using ingress api to route traffic between backend kubernetes services.

Does kube-router IPVS-least connection algorithm, does load balancing across pods in same node or different nodes?

The application which I am working on runs as a deployment in kubernetes cluster. Pods created for this deployment is spread across various nodes in the cluster. Our application can handle only one TCP connection at a time and would reject further connections. Currently we use kube-proxy (Iptables mode) for distributing loads across pods in various nodes, but pods are chosen in a random way and connections are getting dropped when its passed to a busy pod. Can I use Kube-router's least-connection based load balancing algorithm for my usecase. I want the traffic to be load balanced across various pods running in various nodes. Can I achieve this using Kube-router.
As far I know kube-proxy's IPVS mode load balances traffic only across pods in same node, as kube-proxy runs as a daemon set. Is it the same with Kube-router as well?

Kube-proxy's IPVS mode does load balancing of traffic across pods placed in different nodes.
You can refer to this blog post where have a Deep Dive into this matter: IPVS-Based In-Cluster Load Balancing Deep Dive

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse