k8s: forwarding from public VIP to clusterIP with iptables - kubernetes

I'm trying to understand in depth how forwarding from publicly exposed load-balancer's layer-2 VIPs to services' cluster-IPs works. I've read a high-level overview how MetalLB does it and I've tried to replicate it manually by setting keepalived/ucarp VIP and iptables rules. I must be missing something however as it doesn't work ;-]
Steps I took:
created a cluster with kubeadm consisting of a master + 3 nodes running k8s-1.17.2 + calico-3.12 on libvirt/KVM VMs on a single computer. all VMs are in 192.168.122.0/24 virtual network.
created a simple 2 pod deployment and exposed it as a NodePort service with externalTrafficPolicy set to cluster:
$ kubectl get svc dump-request
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dump-request NodePort 10.100.234.120 <none> 80:32292/TCP 65s
I've verified that I can reach it from the host machine on every node's IP at 32292 port.
created a VIP with ucarp on all 3 nodes:
ucarp -i ens3 -s 192.168.122.21 -k 21 -a 192.168.122.71 -v 71 -x 71 -p dump -z -n -u /usr/local/sbin/vip-up.sh -d /usr/local/sbin/vip-down.sh (example from knode1)
I've verified that I can ping the 192.168.122.71 VIP. I even could ssh through it to the VM that was currently holding the VIP.
Now if kube-proxy was in iptables mode, I could also reach the service on its node-port through the VIP at http://192.168.122.71:32292. However, to my surprise, in ipvs mode this always resulted in connection timing out.
added an iptables rule on every node for packets incoming to 192.168.122.71 to be forwarded to to service's cluster-IP 10.100.234.120:
iptables -t nat -A PREROUTING -d 192.168.122.71 -j DNAT --to-destination 10.100.234.120
(later I've also tried to narrow the rule only to the relevant port, but it didn't change the results in any way:
iptables -t nat -A PREROUTING -d 192.168.122.71 -p tcp --dport 80 -j DNAT --to-destination 10.100.234.120:80)
Results:
in iptables mode all requests to http://192.168.122.71:80/ resulted in connection timing out.
in ipvs mode it worked partially:
if the 192.168.122.71 VIP was being held by a node that had a pod on it, then about 50% requests were succeeding and they were always served by the local pod. the app was also getting the real remote IP of the host machine (192.168.122.1). the other 50% (being sent to the pod on anther node presumably) were timing out.
if the VIP was being held by a node without pods then all requests were timing out.
I've also checked if it affects the results in anyway to keep the rule on all nodes at all times vs. to keep it only on the node holding the VIP and deleting it at the release of the VIP: results were the same in both cases.
Does anyone know why it doesn't work and how to fix it? I will appreciate help with this :)

need to add MASQUERADE rule also, so that the source is changed accordingly. for example:
iptables -t nat -A POSTROUTING -j MASQUERADE
tested with ipvs

Related

Escape k8s nodePort range

I am forced to use the usual nodePort range 30000-32000 on managed kubernetes.
However, I need a specific port being exposed from every node outside of that range. Let's say that is port 5000. So, I've fixed nodePort=30033 on my service and I am now trying an old-school iptables redirect on my nodes to get port 5000 "redirected" to 30033:
iptables -t nat -I PREROUTING -p tcp --dport 5000 -j REDIRECT --to-port 30033
This doesn't work. I am suspecting traffic gets hijacked by kube-proxy rules before my rule is even applied.
Any ideas how to make this work with k8s-created iptables rules?

Kubernetes: How to configure the load balancing between Service and Pod

Inside a K8s cluster, I run a web application with 2 Pods (replica is 2), and expose a them using a Service with type LoadBalancer.
Then I do an experiment by sending 2 consecutive requests, I found that both request are handled by the same Pod.
Anyone can help me to explain this behavior?
And what should I do to change this behavior to round robin or something else?
By default, kubernetes uses iptables mode to route traffic between the pods. The pod that is serving request is chosen randomly.
For 2 pods, it is distributed evenly with 0.5 (50%) probability. Because it is not using round-robin, the backend pod is chosen randomly. It will be even in a longer time-frame.
It can be checked using sudo iptables-save.
Example output for 2 pods (for nginx service):
sudo iptables-save | grep nginx
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:" -m tcp --dport 31554 -j KUBE-SVC-4N57TFCL4MD7ZTDA //KUBE-SVC-4N57TFCL4MD7ZTDA is a tag for nginx service
sudo iptables-save | grep KUBE-SVC-4N57TFCL4MD7ZTDA
-A KUBE-SVC-4N57TFCL4MD7ZTDA -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SOWYYRHSSTWLCRDY
As mentioned by #Zambozo IPVS proxy mode allows you to use round-robin algorithm (which is used by default) to spread the traffic equally between the pods.
I think you may have to look into IPVS proxy mode. IPVS provides more options for balancing traffic.
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs

How to make kube-proxy is distribute the load evently?

I have a ClusterIP Service which is used to distribute load to 2 PODs internally.
The load is not distributed evenly across the PODs.
How to make the load distributed evenly ?
Kubernetes uses iptables to distribute the load between pods (iptables proxy mode by default).
If you have 2 pods, it is distributed evenly with 0.5 (50%) probability. Because it is not using round-robin, the backend pod is chosen randomly. It will be even in a longer time-frame.
If there would be 3 pods, probability will change to 1/3 (33%), for 4 pods 1/4 and so on.
To check it you can run sudo iptables-save.
Example output for 2 pods (for nginx service):
sudo iptables-save | grep nginx
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:" -m tcp --dport 31554 -j KUBE-SVC-4N57TFCL4MD7ZTDA //KUBE-SVC-4N57TFCL4MD7ZTDA is a tag for nginx service
sudo iptables-save | grep KUBE-SVC-4N57TFCL4MD7ZTDA
-A KUBE-SVC-4N57TFCL4MD7ZTDA -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SOWYYRHSSTWLCRDY
If you want to make sure load is distributed evenly using round-robin algorithm you can use IPVS which by default uses rr (round-robin). It works as load balancer in front of a cluster and directs requests for TCP- and UDP-based services to the real servers, and make services of the real servers appear as virtual services on a single IP address. It is supported on cluster created by local-up, kubeadm and GCE.
kube-proxy in User space proxy mode chooses backend pod via round robin fashion.
kube-proxy in iptables mode chooses the backend pod randomly.

Kubernetes traffic forwarding between services

I have 1 master and 2 worker nodes. There is 1 service running on 1 node and a similar service is running on the other node. Both of them are of NodePort type. How do I forward http requests coming to the pod of first service to a pod of second service?
I have tried using these iptable rules on the first worker node:
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport <nodeport-of-service1> -j DNAT --to <public IP of worker2>:<nodeport-of-service2>
iptables -t nat -A POSTROUTING -p tcp -d <public IP of worker2> --dport <nodeport of service2> -j MASQUERADE
but it doesn't seem to work. eth0 is the WAN interface on the worker-1.
Any idea how this can be done?
In the comments, #VonC gave you a pretty good link with the explanation about networking.
I just want to add one point to that topic.
The rules which you tried to add are pretty similar to rules which Kube-proxy adds to the iptables when you create a Service inside a cluster.
If you create a Service with NodePort type instead of exposing the port of your Pod, you will get exactly what you need: each connection to the NodePort of the service will be forwarded to the right pod with round-robin load balancing.
That is from the official documentation:
For each Service, kube-proxy installs iptables rules which capture traffic to the Service’s clusterIP (which is virtual) and Port and redirects that traffic to one of the Service’s backend sets. For each Endpoints object, it installs iptables rules which select a backend Pod. By default, the choice of backend is random.
It will always work like that, but if you have a service with NodePort type, kube-proxy will create an additional rule which will forward requests from the selected port on nodes to the Service's clusterIP.

Kubernetes external access to application in container

I'm using kubernates on my cluster on Digital Ocean.
I have application inside container in Pod. And i need to make external connection to this application. I need access to concrete instance (cause i have more than 10 Pod with this application).
So, my question is: how can i make external access to this application.
For example i have public IP 192.168.9.9
And have 2 pods with instances. First listen port 8990 and it's Pod IP is 10.0.0.1 and second listen port 8991 and it's Pod IP is 10.0.0.1.
So, i need to redirect traffic from 192.168.9.9:8990 to 10.0.0.1:8990 and 192.168.9.9:8991 to 10.0.0.1:8991.
Yes, i can do it by using iptables manually. But i want to do it automatically. When new Pod is up, i want to make record in iptables.
I can watch for services by using api:
127.0.0.1:8080/api/v1beta1/watch/services
And can get ip of pod here:
127.0.0.1:8080/api/v1beta1/pods
I found solution that do something similar to my needs here. But it looks like poor architectural decision. Is it better way to redirect external traffic to pod automatically after new Pod is up?
If your public ip is configured on an interface on one of your minions, then all you need to do is set the publicIPs value in your service description. For example, if you define a service like this:
kind: Service
apiVersion: v1beta1
id: test-web
port: 8888
selector:
name: test-web
containerPort: 80
publicIPs:
- 192.168.1.43
Then Kubernetes will create iptables rules like this:
-A KUBE-PORTALS-CONTAINER -d 192.168.1.43/32 -p tcp -m comment --comment test-web -m tcp --dport 8888 -j REDIRECT --to-ports 38541
-A KUBE-PORTALS-HOST -d 192.168.1.43/32 -p tcp -m comment --comment test-web -m tcp --dport 8888 -j DNAT --to-destination 192.168.1.20:38541
These rules redirect traffic to your publicIP and port to the appropriate port maintained by the local kube-proxy instance. I only wrote kiwi (and I'm sorry you don't like it!) to provide a mechanism for dynamically allocating public ip addresses. As long as you don't mind pre-configuring the addresses on your interfaces, you should be all set.