How kubernetes external ip multi-pod routing works? - kubernetes

I have bare metal Kubernetes cluster with haproxy ingress controller (daemon set) on external ip. Is it possible to restrict kube-proxy to route to local haproxy ingress pod?
To be more specific, I have 2 pods of haproxy ingress controller and use one external ip for them. As per my understanding, kube-proxy will be routing in round-robin to the pods. I didn't find any way to restrict this particular behaviour.

Set externalTrafficPolicy: Local in the NodePort Service.
This will make it so that traffic going to a node X will only go to the pod in node X. If there is no pod in node X the traffic will be dropped (but this should not be an issue since you're using a DaemonSet).
Another benefit is that this preserves the true source IP that haproxy sees. Without externalTrafficPolicy, it is possible that haproxy sees the source IP of another node instead of the original one, since nodes can proxy traffic.
More info here

Related

Real IP (Domains and Subtomains) on Bare Metal Cluster with MatalLB and Ingress

help me figure it out.
I have a Bare Metal Kubernetes cluster with three nodes, each node has a public ip.
I have installed MetalLB and IngressController.
It is not clear to me which IP should I redirect domains and subdomains to so that they can be resolved by the Ingress Controller?
I need to initially define on which node the Ingress Controller will be launched?
I need to install the Ingress Controller, and then look at the worker node, on which it will be installed and send all domains or subdomains there?
What happens if, after restarting the cluster, the ingress controller will be deployed on another node?
All the tutorials I've seen show how it works locally or with a cloud load balancer.
Help me understand how this should work correctly.
Usually, when you install MetalLB, you configure a pool of addresses which can be used to assign new IPs at LoadBalancer services whenever they are created. Such IP addresses need to be available, they cannot be created out of nothing of course.. they could be in lease from your hosting provider for example.
If instead you have a private Bare Metal cluster which serves only your LAN network, you could just select a private range of IP addresses which are not used.
Then, once MetalLB is running, what happens is the following:
Someone / something creates a LoadBalancer services (an HELM Chart, a user with a definition, with commands, etc)
The newly created service needs an external IP. MetalLB will select one address from the configured selected range and assign it to that service
MetalLb will start to announce using standard protocol that the IP address can now be reached by contacting the cluster, it can work either in Layer2 mode (one node of the cluster holds that additional IP address) or BGP (true load balancing across all nodes of the cluster)
From that point, you can just reach the new service by contacting this newly assigned IP address (which is NOT the ip of any of the cluster nodes)
Usually, the Ingress Controller will just bring a LoadBalancer service (which will grab an external IP address from MetalLb) and then, you can reach hte Ingress Controller from that IP.
As for your other questions, you don't need to worry about where the Ingress Controller is running or similar, it will be automatically handled.
The only thing you may want to do is to make the domain names which you want to serve point to the external IP address assigned to the Ingress Controller.
Some docs:
MetalLB explanations
Bitnami MetalLB chart
LoadBalancer service docs
As an alternative (especially when you want "static" ip addresses) I should mention HAProxy, installed external to kubernetes cluster in a bare_server/vm/lxc_container/etc. and configured to send all incoming 80/433 traffic to the NodePort of ingress controller on all kubernetes workers (if no ingress pod is running on that worker traffic will be forwarded by kubernetes).
Of course, nowadays ip addresses are also "cattle", not "pets" anymore, so MetalLB is more of a "kubernetish" solution, but who knows ...
This is the link describing HAProxy solution (I am not affiliated with the author):
https://itnext.io/bare-metal-kubernetes-with-kubeadm-nginx-ingress-controller-and-haproxy-bb0a7ef29d4e

How to pass incoming traffic in bare-metal cluster with MetalLB and Ingress Controllers?

I'm trying to wrap my head around exposing internal loadbalancing to outside world on bare metal k8s cluster.
Let's say we have a basic cluster:
Some master nodes and some worker nodes, that has two interfaces, one public facing (eth0) and one local(eth1) with ip within 192.168.0.0/16 network
Deployed MetalLB and configured 192.168.200.200-192.168.200.254 range for its internal ips
Ingress controller with its service with type LoadBalancer
MetalLB now should assign one of the ips from 192.168.200.200-192.168.200.254 to ingress service, as of my current understanding.
But I have some following questions:
On every node I could curl ingress controller externalIP (as long as they are reachable on eth1) with host header attached and get a response from a service thats configured in coresponding ingress resource or is it valid only on node where Ingress pods are currently placed?
What are my options to pass incoming external traffic to eth0 to an ingress listening on eth1 network?
Is it possible to forward requests saving source ip address or attaching X-Forwarded-For header is the only option?
Assuming that we are talking about Metallb using Layer2.
Addressing the following questions:
On every node I could curl ingress controller externalIP (as long as they are reachable on eth1) with host header attached and get a response from a service thats configured in coresponding ingress resource or is it valid only on node where Ingress pods are currently placed?
Is it possible to forward requests saving source ip address or attaching X-Forwarded-For header is the only option?
Dividing the solution on the premise of preserving the source IP, this question could go both ways:
Preserve the source IP address
To do that you would need to set the Service of type LoadBalancer of your Ingress controller to support "Local traffic policy" by setting (in your YAML manifest):
.spec.externalTrafficPolicy: Local
This setup will be valid as long as on each Node there is replica of your Ingress controller as all of the networking coming to your controller will be contained in a single Node.
Citing the official docs:
With the Local traffic policy, kube-proxy on the node that received the traffic sends it only to the service’s pod(s) that are on the same node. There is no “horizontal” traffic flow between nodes.
Because kube-proxy doesn’t need to send traffic between cluster nodes, your pods can see the real source IP address of incoming connections.
The downside of this policy is that incoming traffic only goes to some pods in the service. Pods that aren’t on the current leader node receive no traffic, they are just there as replicas in case a failover is needed.
Metallb.universe.tf: Usage: Local traffic policy
Do not preserve the source IP address
If your use case does not require you to preserve the source IP address, you could go with the:
.spec.externalTrafficPolicy: Cluster
This setup won't require that the replicas of your Ingress controller will be present on each Node.
Citing the official docs:
With the default Cluster traffic policy, kube-proxy on the node that received the traffic does load-balancing, and distributes the traffic to all the pods in your service.
This policy results in uniform traffic distribution across all pods in the service. However, kube-proxy will obscure the source IP address of the connection when it does load-balancing, so your pod logs will show that external traffic appears to be coming from the service’s leader node.
Metallb.universe.tf: Usage: Cluster traffic policy
Addressing the 2nd question:
What are my options to pass incoming external traffic to eth0 to an ingress listening on eth1 network?
Metallb listen by default on all interfaces, all you need to do is to specify the address pool from this eth within Metallb config.
You can find more reference on this topic by following:
Metallb.universe.tf: FAQ: In layer 2 mode how to specify the host interface for an address pool
An example of such configuration, could be following:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools: # HERE
- name: my-ip-space
protocol: layer2
addresses:
- 192.168.1.240/28

Kubernetes LoadBalancer type service's external IP is unreachable from pods within the cluster when externalTrafficPolicy is set to Local in GCE

The external IP is perfectly reachable from outside the cluster. It's perfectly reachable from all nodes within the cluster. However, when I try to telnet to the URL from a pod within the cluster that is not on the same node as a pod that is part of the service backend, the connection always times out.
The external IP is reachable by pods that run on the same node as a pod that is part of the service backend.
All pods can perfectly reach the cluster IP of the service.
When I set externalTrafficPolicy to Cluster, the pods are able to reach the external URL regardless of what node they're on.
I am using iptables proxying and kubernetes 1.16
I'm completely at a loss here as to why this is happening. Is someone able to shed some light on this?
From the official doc here,
service.spec.externalTrafficPolicy - denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints. There are two available options: Cluster (default) and Local. Cluster obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading. Local preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading.
The service could be either node-local or cluster-wide endpoints. When you define the externalTrafficPolicy as Local, it means node-local. So, other nodes are not able to reach it.
So, you will need to set the externalTrafficPolicy as Cluster instead.

Bare-Metal K8s: How to preserve source IP of client and direct traffic to nginx replica on current server on

I would like to ask you about some assistance:
Entrypoint to cluster for http/https is NGINX: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0 running as deamonset
I want to achieve 2 things:
preserve source IP of client
direct traffic to nginx replica on
current server (so if request is sent to server A, listed as
externalIP address, nginx on node A should handle it)
Questions:
How is it possible?
Is it possible without nodeport? Control plane can be started with custom --service-node-port-range so I can add nodeport for 80
and 443, but it looks a little bit like a hack (after reading about
nodeport intended usage)
I was considering using metallb, but layer2 configuration will cause bottleneck (high traffic on cluster). I am not sure if BGP will solve this problem.
Kubernetes v15
Bare-metal
Ubuntu 18.04
Docker (18.9) and WeaveNet (2.6)
You can preserve the source IP of client by using externalTrafficPolicy set to local, this will proxy requests to local endpoints. This is explained on Source IP for Services with Type=NodePort.
Can should also have a look at Using Source IP.
In case of MetalLB:
MetalLB respects the service’s externalTrafficPolicy option, and implements two different announcement modes depending on what policy you select. If you’re familiar with Google Cloud’s Kubernetes load balancers, you can probably skip this section: MetalLB’s behaviors and tradeoffs are identical.
“Local” traffic policy
With the Local traffic policy, nodes will only attract traffic if they are running one or more of the service’s pods locally. The BGP routers will load-balance incoming traffic only across those nodes that are currently hosting the service. On each node, the traffic is forwarded only to local pods by kube-proxy, there is no “horizontal” traffic flow between nodes.
This policy provides the most efficient flow of traffic to your service. Furthermore, because kube-proxy doesn’t need to send traffic between cluster nodes, your pods can see the real source IP address of incoming connections.
The downside of this policy is that it treats each cluster node as one “unit” of load-balancing, regardless of how many of the service’s pods are running on that node. This may result in traffic imbalances to your pods.
For example, if your service has 2 pods running on node A and one pod running on node B, the Local traffic policy will send 50% of the service’s traffic to each node. Node A will split the traffic it receives evenly between its two pods, so the final per-pod load distribution is 25% for each of node A’s pods, and 50% for node B’s pod. In contrast, if you used the Cluster traffic policy, each pod would receive 33% of the overall traffic.
In general, when using the Local traffic policy, it’s recommended to finely control the mapping of your pods to nodes, for example using node anti-affinity, so that an even traffic split across nodes translates to an even traffic split across pods.
You need to take for the account the limitations of BGP routing protocol for MetalLB.
Please also have a look at this blog post Using MetalLb with Kind.

Routing traffic to kubernetes cluster

I have a question related to Kubernetes networking.
I have a microservice (say numcruncherpod) running in a pod which is serving requests via port 9000, and I have created a corresponding Service of type NodePort (numcrunchersvc) and node port which this service is exposed is 30900.
My cluster has 3 nodes with following IPs:
192.168.201.70,
192.168.201.71
192.168.201.72
I will be routing the traffic to my cluster via reverse proxy (nginx). As I understand in nginx I need to specify IPs of all these cluster nodes to route the traffic to the cluster, is my understanding correct ?
My worry is since nginx won't have knowledge of cluster it might not be a good judge to decide the cluster node to which the traffic should be sent to. So is there a better way to route the traffic to my kubernetes cluster ?
PS: I am not running the cluster on any cloud platform.
This answer is a little late, and a little long, so I ask for forgiveness before I begin. :)
For people not running kubernetes clusters on Cloud Providers there are 4 distinct options for exposing services running inside the cluster to the world outside.
Service of type: NodePort. This is the simplest and default. Kubernetes assigns a random port to your service. Every node in the cluster listens for traffic to this particular port and then forwards that traffic to any one of the pods backing that service. This is usually handled by kube-proxy, which leverages iptables and load balances using a round-robin strategy. Typically since the UX for this setup is not pretty, people often add an external "proxy" server, such as HAProxy, Nginx or httpd to listen to traffic on a single IP and forward it to one of these backends. This is the setup you, OP, described.
A step up from this would be using a Service of type: ExternalIP. This is identical to the NodePort service, except it also gets kubernetes to add an additional rule on all kubernetes nodes that says "All traffic that arrives for destination IP == must also be forwarded to the pods". This basically allows you to specify any arbitrary IP as the "external IP" for the service. As long as traffic destined for that IP reaches one of the nodes in the cluster, it will be routed to the correct pod. Getting that traffic to any of the nodes however, is your responsibility as the cluster administrator. The advantage here is that you no longer have to run an haproxy/nginx setup, if you specify the IP of one of the physical interfaces of one of your nodes (for example one of your master nodes). Additionally you cut down the number of hops by one.
Service of type: LoadBalancer. This service type brings baremetal clusters at parity with cloud providers. A fully functioning loadbalancer provider is able to select IP from a pre-defined pool, automatically assign it to your service and advertise it to the network, assuming it is configured correctly. This is the most "seamless" experience you'll have when it comes to kubernetes networking on baremetal. Most of LoadBalancer provider implementations use BGP to talk and advertise to an upstream L3 router. Metallb and kube-router are the two FOSS projects that fit this niche.
Kubernetes Ingress. If your requirement is limited to L7 applications, such as REST APIs, HTTP microservices etc. You can setup a single Ingress provider (nginx is one such provider) and then configure ingress resources for all your microservices, instead of service resources. You deploy your ingress provider and make sure it has an externally available and routable IP (you can pin it to a master node, and use the physical interface IP for that node for example). The advantage of using ingress over services is that ingress objects understand HTTP mircoservices natively and you can do smarter health checking, routing and management.
Often people combine one of (1), (2), (3) with (4), since the first 3 are L4 (TCP/UDP) and (4) is L7. So things like URL path/Domain based routing, SSL Termination etc is handled by the ingress provider and the IP lifecycle management and routing is taken care of by the service layer.
For your use case, the ideal setup would involve:
A deployment for your microservice, with health endpoints on your pod
An Ingress provider, so that you can tweak/customize your routing/load-balancing as well as use for SSL termination, domain matching etc.
(optional): Use a LoadBalancer provider to front your Ingress provider, so that you don't have to manually configure your Ingress's networking.
Correct. You can route traffic to any or all of the K8 minions. The K8 network layer will forward to the appropriate minion if necessary.
If you are running only a single pod for example, nginx will most likely round-robin the requests. When the requests hit a minion which does not have the pod running on it, the request will be forwarded to the minion that does have the pod running.
If you run 3 pods, one on each minion, the request will be handled by whatever minion gets the request from nginx.
If you run more than one pod on each minion, the requests will be round-robin to each minion, and then round-robin to each pod on that minion.