We have an K8S Cluster environment with 1 master node and 2 worker nodes and all are on Linux and we are using flannel
Example is given below
Master (CentOS 7) - 192.168.10.1
Worker Node-1 (CentOS 7) - 192.168.10.2
Worker Node-2 (CentOS 7) - 192.168.10.3
Worker Node-3 (Windows ) - 192.168.10.4
Now, we have to add a Windows node (eg 192.168.10.4) to existing cluster 192.168.10.1
According to this link it appears that we have to update cni-conf.json section of flannel from cbr0 to vxlan0 and to my understanding this is done to communicate with Windows
My question will this change (from cbr0 to vxlan0) break the existing communication between Linux to Linux?
Let's start with definitions.
cbr0 is own kubernetes bridge which is created to differentiate from docker0 bridge used by docker.
vxlan stands for Virtual Extensible LAN and it's an overlay network which means it encapsulates packet into another packet.
More precise definition:
VXLAN is an encapsulation protocol that provides data center
connectivity using tunneling to stretch Layer 2 connections over an
underlying Layer 3 network.
The VXLAN tunneling protocol that encapsulates Layer 2 Ethernet frames
in Layer 3 UDP packets, enables you to create virtualized Layer 2
subnets, or segments, that span physical Layer 3 networks. Each Layer
2 subnet is uniquely identified by a VXLAN network identifier (VNI)
that segments traffic.
Answer
No, it won't break anything in communication between Linux nodes. This is an another option how nodes can communicate between each other using flannel CNI. I also tested this on my two nodes linux cluster and everything worked fine.
Main difference is how flannel will work with packets. It will be visible via netstat or wireshark, while for PODs nothing is going to be change because packets will be normalized when they come to PODs.
Note! I recommend testing this change on a small dev/test cluster as there may be some additional setup for firewalld (usual rule before making any changes on production).
Useful links:
Flannel - recommended backends for VXLAN
Kubernetes Journey — Up and running out of the cloud — flannel
How Kubernetes Networking Works – Under the Hood
Related
When learning the Kubernetes CNI, I heard some plugins are using the BGP or VXLAN under the hood.
On the internet, border gateway protocol (BGP) manages how packets are routed between edge routers.
Autonomous systems (AS) are network routers managed by a single enterprise or service provider. for example, Facebook and Google.
Autonomous systems (AS) communicate with peers and form a mesh.
But I still can't figure out how does the CNI plugin take advantage of BGP.
Imagine there is a Kubernetes cluster, which is composed of 10 nodes. Calico is the chosen CNI plugin.
Who plays the Autonomous System(AS) role? Is each node an AS?
How are packets forward from one node to another node? Is the iptable still required?
The CNI plugin is responsible for allocating IP addresses (IPAM) and ensuring that packets get where they need to get.
For Calico specifically, you can get a lot of information from the architecture page as well as the Calico network design memoirs.
Whenever a new Pod is created, the IPAM plugin allocates an IP address from the global pool and the Kubernetes scheduler assigns the Pod to a Node. The Calico CNI plugin (like any other) configures the networking stack to accept connections to the Pod IP and routes them to the processes inside. This happens with iptables and uses a helper process called Felix.
Each Node also runs a BIRD (BGP) daemon that watches for these configuration events: "IP 10.x.y.z is hosted on node A". These configuration events are turned into BGP updates and sent to other nodes using the open BGP sessions.
When the other nodes receive these BGP updates, they program the node route table (with simple ip route commands) to ensure the node knows how to reach the Pod. In this model, yes, every node is an AS.
What I just described is the "AS per compute server" model: it is suitable for small deployments in environments where nodes are not necessarily on the same L2 network. The problem is that each node needs to maintain a BGP session with every other node, which scales as O(N^2).
For larger deployments therefore, a compromise is to run one AS per rack of compute servers ("AS per rack"). Each top of rack switch then runs BGP to communicate routes to other racks, while the switch internally knows how to route packets.
I have a single master cluster with 3 worker nodes. The master node has one network interface of 10Gb capacity and all worker nodes have two interfaces: 10Gb and 40Gb interface. They are all connected via a switch.
By default, Kubernetes binds to the default network eth0 which is 10Gb for the worker nodes. How do I specify the 40Gb interface at joining?
The kubeadm init command has a --apiserver-advertise-address argument but this is for the apiserver. Is there any equivalent option for the worker nodes so the communciation between master and worker (and between workers) are realised on the 40Gb link?
Please note that this is a bare-metal on-prem installation with OSS Kubernetes v1.20.
You can use the --hostname-override flag to override the default kubelet behavior. The default name of the kubelet equals to the hostname and it's ip address default to the interface's ip address default gateway.
For more details please visit this issue.
There is nothing specific, you would have to manage this at the routing level. If you're using BGP internally it would usually do this automatically because the faster link will have a higher metric but if you're using a simpler static routing setup then you may need to tweak things.
Pods live on internal virtual adapters so they don't listen on any physical interface (for all CNIs I know of anyway, except the AWS one).
Do we need any specific wifi-router/ LAN router with metalLb in kubernetes.
How does metalLB help if it is on a machine .. all the router traffic would have to first come on the machine and then get routed; causing the machine to be the bottleneck.
Shouldn't the metalLB solution fit somewhere in the router itself ?
Maybe first what is MetalLB and why to use it:
MetalLB is a load-balancer implementation for bare metal Kubernetes clusters, using standard routing protocols.
...
Bare metal cluster operators are left with two lesser tools to bring user traffic into their clusters, “NodePort” and “externalIPs” services. Both of these options have significant downsides for production use, which makes bare metal clusters second class citizens in the Kubernetes ecosystem.
MetalLB aims to redress this imbalance by offering a Network LB implementation that integrates with standard network equipment, so that external services on bare metal clusters also “just work” as much as possible.
There is nothing special needed besides correctly routing the traffic to your bare metal server. You might set it up as DMZ Host or just forward ports to the server behind the router.
If you are looking into Load Balancing a traffic before the server, that will only work with several servers.
If you have 4 bare metal serves, you can setup one as master node and other three as worker nodes, so master node would be responsible for balancing the load across worker nodes.
You can use MetalLB in Layer 2 Mode
In layer 2 mode, one node assumes the responsibility of advertising a service to the local network. From the network’s perspective, it simply looks like that machine has multiple IP addresses assigned to its network interface.
Under the hood, MetalLB responds to ARP requests for IPv4 services, and NDP requests for IPv6.
The major advantage of the layer 2 mode is its universality: it will work on any ethernet network, with no special hardware required, not even fancy routers.
and BGP Mode
In BGP mode, each node in your cluster establishes a BGP peering session with your network routers, and uses that peering session to advertise the IPs of external cluster services.
Assuming your routers are configured to support multipath, this enables true load-balancing: the routes published by MetalLB are equivalent to each other, except for their nexthop. This means that the routers will use all nexthops together, and load-balance between them.
Once the packets arrive at the node, kube-proxy is responsible for the final hop of traffic routing, to get the packets to one specific pod in the service.
You can read more about usage or MetalLB here.
I am trying to install Kubernetes in my on-premise server Ubuntu 16.04. And referring following documentation ,
https://medium.com/#Grigorkh/install-kubernetes-on-ubuntu-1ac2ef522a36
After installing kubelete kubeadm and kubernetes-cni I found that to initiate kubeadm with following command,
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.133.15.28 --kubernetes-version stable-1.8
Here I am totally confused about why we are setting cidr and api server advertise address. I am adding few confusion from Kubernetes here,
Why we are specifying CIDR and --apiserver-advertise-address here?
How I can find these two address for my server?
And why flannel is using in Kubernetes installation?
I am new to this containerization and Kubernetes world.
Why we are specifying CIDR and --apiserver-advertise-address here?
And why flannel is using in kubernetes installation?
Kubernetes using Container Network Interface for creating a special virtual network inside your cluster for communication between pods.
Here is some explanation "why" from documentation:
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
all containers can communicate with all other containers without NAT
all nodes can communicate with all containers (and vice-versa) without NAT
the IP that a container sees itself as is the same IP that others see it as
Kubernetes applies IP addresses at the Pod scope - containers within a Pod share their network namespaces - including their IP address. This means that containers within a Pod can all reach each other’s ports on localhost. This does imply that containers within a Pod must coordinate port usage, but this is no different than processes in a VM. This is called the “IP-per-pod” model.
So, Flannel is one of the CNI which can be used for create network which will connect all your pods and CIDR option define a subnet for that network. There are many alternative CNI with similar functions.
If you want to get more details about how network working in Kubernetes you can read by link above or, as example, here.
How I can find these two address for my server?
API server advertise address has to be only one and static. That address using by all components to communicate with API server. Unfortunately, Kubernetes has no support of multiple API server addresses per master.
But, you can still use as many addresses on your server as you want, but only one of them you can define as --apiserver-advertise-address. The only one request for it - it has to be accessible from all your nodes in cluster.
I'm in the process of setting up a Kubernetes cluster from scratch. I am looking to install Flannel as part of the installation process. When I look at online guides/examples I can see that it is necessary to configure the Flannel subnetwork.
I can see that some guides (deploying-kubernetes-using-ansible.html) set up the flannel network like this:
{
"Network": "172.16.0.0/12",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan"
}
}
whereas another guide here (Kubernetes – simple install on CentOS 7) sets up the network like this:
{"Network":"172.17.0.0/16"}
I am still learning about CIDR notation, so I can see that there are more IP addresses available with the first approach than the second. The second URL states that:
All your kubernetes nodes will be in 3 different subnets at the same
time:
External interface subnet: 10.0.1.0/24
Flannel subnet: 172.17.0.0/16 #Do not use existing subnet
Service cluster subnet: 10.10.10.0/24 # Do not use existing subnet
I can see from Wikipedia (Private IPv4 address spaces) that the 172 range is a private address space of up to /12.
The implications of the quote as I see them are:
External interface: /24 (set by the network admin) == up to 255 hosts on the external network. This is the max number of nodes in the cluster.
Flannel subnet: 172.17.0.0/16 (set by Flannel config) == up to 65535 IPs in the Flannel network. What does this mean?
Service cluster: 10.10.10.0/24 (set by KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.10.10.0/24") == up to 255 services in the cluster? (docs here)
What are the practical implications of changing the Flannel config to /12 (or any other number from 12..31)?
Same question for service-cluster-ip-range and how do you deconflict the service IPs from the IPs of pods?
Actually flannel config Network(/12) must be smaller than SubnetLen (/24).External interface subnet is the subnet of your host. service-cluster-ip-range is the scope of clusterIP which is virtual (default implement by iptables in kubernetes). Iptables and routes conflicts will appear when they have the same ip range. so we should specific different ip range for them.