How to make kube-proxy is distribute the load evently? - kubernetes

I have a ClusterIP Service which is used to distribute load to 2 PODs internally.
The load is not distributed evenly across the PODs.
How to make the load distributed evenly ?

Kubernetes uses iptables to distribute the load between pods (iptables proxy mode by default).
If you have 2 pods, it is distributed evenly with 0.5 (50%) probability. Because it is not using round-robin, the backend pod is chosen randomly. It will be even in a longer time-frame.
If there would be 3 pods, probability will change to 1/3 (33%), for 4 pods 1/4 and so on.
To check it you can run sudo iptables-save.
Example output for 2 pods (for nginx service):
sudo iptables-save | grep nginx
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:" -m tcp --dport 31554 -j KUBE-SVC-4N57TFCL4MD7ZTDA //KUBE-SVC-4N57TFCL4MD7ZTDA is a tag for nginx service
sudo iptables-save | grep KUBE-SVC-4N57TFCL4MD7ZTDA
-A KUBE-SVC-4N57TFCL4MD7ZTDA -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SOWYYRHSSTWLCRDY
If you want to make sure load is distributed evenly using round-robin algorithm you can use IPVS which by default uses rr (round-robin). It works as load balancer in front of a cluster and directs requests for TCP- and UDP-based services to the real servers, and make services of the real servers appear as virtual services on a single IP address. It is supported on cluster created by local-up, kubeadm and GCE.

kube-proxy in User space proxy mode chooses backend pod via round robin fashion.
kube-proxy in iptables mode chooses the backend pod randomly.

Related

Escape k8s nodePort range

I am forced to use the usual nodePort range 30000-32000 on managed kubernetes.
However, I need a specific port being exposed from every node outside of that range. Let's say that is port 5000. So, I've fixed nodePort=30033 on my service and I am now trying an old-school iptables redirect on my nodes to get port 5000 "redirected" to 30033:
iptables -t nat -I PREROUTING -p tcp --dport 5000 -j REDIRECT --to-port 30033
This doesn't work. I am suspecting traffic gets hijacked by kube-proxy rules before my rule is even applied.
Any ideas how to make this work with k8s-created iptables rules?

Kubernetes: How to configure the load balancing between Service and Pod

Inside a K8s cluster, I run a web application with 2 Pods (replica is 2), and expose a them using a Service with type LoadBalancer.
Then I do an experiment by sending 2 consecutive requests, I found that both request are handled by the same Pod.
Anyone can help me to explain this behavior?
And what should I do to change this behavior to round robin or something else?
By default, kubernetes uses iptables mode to route traffic between the pods. The pod that is serving request is chosen randomly.
For 2 pods, it is distributed evenly with 0.5 (50%) probability. Because it is not using round-robin, the backend pod is chosen randomly. It will be even in a longer time-frame.
It can be checked using sudo iptables-save.
Example output for 2 pods (for nginx service):
sudo iptables-save | grep nginx
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:" -m tcp --dport 31554 -j KUBE-SVC-4N57TFCL4MD7ZTDA //KUBE-SVC-4N57TFCL4MD7ZTDA is a tag for nginx service
sudo iptables-save | grep KUBE-SVC-4N57TFCL4MD7ZTDA
-A KUBE-SVC-4N57TFCL4MD7ZTDA -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SOWYYRHSSTWLCRDY
As mentioned by #Zambozo IPVS proxy mode allows you to use round-robin algorithm (which is used by default) to spread the traffic equally between the pods.
I think you may have to look into IPVS proxy mode. IPVS provides more options for balancing traffic.
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs

kubernetes pods went down when iptables changed

When I modified the iptables on the host,the k8s pods went down. seems like communication within cluster blocked. pod status turned into ContainerCreating.
I just want to do a simple ip white list like below.
iptables -A INPUT -s 10.xxx.4.0/24 -p all -j ACCEPT
iptables -A INPUT -j REJECT
Then I delete the reject item in the iptables,pods went running again.
I just want to know how to do a simple ip white list on a host and Not affecting k8s pods running?
the events:
!
Kubernetes uses iptables to control the network connections between pods (and between nodes), handling many of the networking and port forwarding rules. With the iptables -A INPUT -j REJECT you are actually not allowing him to do that.
Taken from the understanding kubernestes networking model article:
In Kubernetes, iptables rules are configured by the kube-proxy controller that watches the Kubernetes API server for changes. When a change to a Service or Pod updates the virtual IP address of the Service or the IP address of a Pod, iptables rules are updated to correctly route traffic directed at a Service to a backing Pod. The iptables rules watch for traffic destined for a Service’s virtual IP and, on a match, a random Pod IP address is selected from the set of available Pods and the iptables rule changes the packet’s destination IP address from the Service’s virtual IP to the IP of the selected Pod. As Pods come up or down, the iptables ruleset is updated to reflect the changing state of the cluster. Put another way, iptables has done load-balancing on the machine to take traffic directed to a service’s IP to an actual pod’s IP.
To secure cluster its better to put all custom rules on the gateway(ADC) or into cloud security groups. Cluster security level can be handled via Network Policies, Ingress, RBAC and others.
Kubernetes has also a good article about Securing a Cluster and github guide with best practices about kubernestes security.

k8s: forwarding from public VIP to clusterIP with iptables

I'm trying to understand in depth how forwarding from publicly exposed load-balancer's layer-2 VIPs to services' cluster-IPs works. I've read a high-level overview how MetalLB does it and I've tried to replicate it manually by setting keepalived/ucarp VIP and iptables rules. I must be missing something however as it doesn't work ;-]
Steps I took:
created a cluster with kubeadm consisting of a master + 3 nodes running k8s-1.17.2 + calico-3.12 on libvirt/KVM VMs on a single computer. all VMs are in 192.168.122.0/24 virtual network.
created a simple 2 pod deployment and exposed it as a NodePort service with externalTrafficPolicy set to cluster:
$ kubectl get svc dump-request
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dump-request NodePort 10.100.234.120 <none> 80:32292/TCP 65s
I've verified that I can reach it from the host machine on every node's IP at 32292 port.
created a VIP with ucarp on all 3 nodes:
ucarp -i ens3 -s 192.168.122.21 -k 21 -a 192.168.122.71 -v 71 -x 71 -p dump -z -n -u /usr/local/sbin/vip-up.sh -d /usr/local/sbin/vip-down.sh (example from knode1)
I've verified that I can ping the 192.168.122.71 VIP. I even could ssh through it to the VM that was currently holding the VIP.
Now if kube-proxy was in iptables mode, I could also reach the service on its node-port through the VIP at http://192.168.122.71:32292. However, to my surprise, in ipvs mode this always resulted in connection timing out.
added an iptables rule on every node for packets incoming to 192.168.122.71 to be forwarded to to service's cluster-IP 10.100.234.120:
iptables -t nat -A PREROUTING -d 192.168.122.71 -j DNAT --to-destination 10.100.234.120
(later I've also tried to narrow the rule only to the relevant port, but it didn't change the results in any way:
iptables -t nat -A PREROUTING -d 192.168.122.71 -p tcp --dport 80 -j DNAT --to-destination 10.100.234.120:80)
Results:
in iptables mode all requests to http://192.168.122.71:80/ resulted in connection timing out.
in ipvs mode it worked partially:
if the 192.168.122.71 VIP was being held by a node that had a pod on it, then about 50% requests were succeeding and they were always served by the local pod. the app was also getting the real remote IP of the host machine (192.168.122.1). the other 50% (being sent to the pod on anther node presumably) were timing out.
if the VIP was being held by a node without pods then all requests were timing out.
I've also checked if it affects the results in anyway to keep the rule on all nodes at all times vs. to keep it only on the node holding the VIP and deleting it at the release of the VIP: results were the same in both cases.
Does anyone know why it doesn't work and how to fix it? I will appreciate help with this :)
need to add MASQUERADE rule also, so that the source is changed accordingly. for example:
iptables -t nat -A POSTROUTING -j MASQUERADE
tested with ipvs

Kubernetes traffic forwarding between services

I have 1 master and 2 worker nodes. There is 1 service running on 1 node and a similar service is running on the other node. Both of them are of NodePort type. How do I forward http requests coming to the pod of first service to a pod of second service?
I have tried using these iptable rules on the first worker node:
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport <nodeport-of-service1> -j DNAT --to <public IP of worker2>:<nodeport-of-service2>
iptables -t nat -A POSTROUTING -p tcp -d <public IP of worker2> --dport <nodeport of service2> -j MASQUERADE
but it doesn't seem to work. eth0 is the WAN interface on the worker-1.
Any idea how this can be done?
In the comments, #VonC gave you a pretty good link with the explanation about networking.
I just want to add one point to that topic.
The rules which you tried to add are pretty similar to rules which Kube-proxy adds to the iptables when you create a Service inside a cluster.
If you create a Service with NodePort type instead of exposing the port of your Pod, you will get exactly what you need: each connection to the NodePort of the service will be forwarded to the right pod with round-robin load balancing.
That is from the official documentation:
For each Service, kube-proxy installs iptables rules which capture traffic to the Service’s clusterIP (which is virtual) and Port and redirects that traffic to one of the Service’s backend sets. For each Endpoints object, it installs iptables rules which select a backend Pod. By default, the choice of backend is random.
It will always work like that, but if you have a service with NodePort type, kube-proxy will create an additional rule which will forward requests from the selected port on nodes to the Service's clusterIP.