Kubernetes iptables - should the master node be running pods? - kubernetes

In my kubernetes cluster I have a flannel overlay configured. the iptables on the master node and the minions are not the same. I understand that this is by design but it creates the problem that pods can't run on the master node: on the master, external IP addresses (for a service) are not resolved.
On the minions there is a KUBE-PORTALS-CONTAINER and KUBE-PORTALS-HOST chain which redirects service cluster and external IP addresses. Static routes redirect this traffic to the master which actually resolves the addresses.
On the master there isn't the 2 chains mentioned above but instead there is a KUBE-SERVICES chain which resolves services IPs to pod IP addresses.
Is there a way to configure the master node to have the chains that the nodes have as well as the service resolution chain too?

Pods are running on Kubernetes nodes (formerly called minions) but not on masters, see Kubernetes architecture. You may also find the debugging Service doc helpful.

Related

kubernetes pods went down when iptables changed

When I modified the iptables on the host,the k8s pods went down. seems like communication within cluster blocked. pod status turned into ContainerCreating.
I just want to do a simple ip white list like below.
iptables -A INPUT -s 10.xxx.4.0/24 -p all -j ACCEPT
iptables -A INPUT -j REJECT
Then I delete the reject item in the iptables,pods went running again.
I just want to know how to do a simple ip white list on a host and Not affecting k8s pods running?
the events:
!
Kubernetes uses iptables to control the network connections between pods (and between nodes), handling many of the networking and port forwarding rules. With the iptables -A INPUT -j REJECT you are actually not allowing him to do that.
Taken from the understanding kubernestes networking model article:
In Kubernetes, iptables rules are configured by the kube-proxy controller that watches the Kubernetes API server for changes. When a change to a Service or Pod updates the virtual IP address of the Service or the IP address of a Pod, iptables rules are updated to correctly route traffic directed at a Service to a backing Pod. The iptables rules watch for traffic destined for a Service’s virtual IP and, on a match, a random Pod IP address is selected from the set of available Pods and the iptables rule changes the packet’s destination IP address from the Service’s virtual IP to the IP of the selected Pod. As Pods come up or down, the iptables ruleset is updated to reflect the changing state of the cluster. Put another way, iptables has done load-balancing on the machine to take traffic directed to a service’s IP to an actual pod’s IP.
To secure cluster its better to put all custom rules on the gateway(ADC) or into cloud security groups. Cluster security level can be handled via Network Policies, Ingress, RBAC and others.
Kubernetes has also a good article about Securing a Cluster and github guide with best practices about kubernestes security.

How do Kubernetes NodePort services with Service.spec.externalTrafficPolicy=Local route traffic?

There seems to be two contradictory explanations of how NodePort services route traffic. Services can route traffic to one of the two, not both:
Nodes (through the kube-proxy) According to kubectl explain Service.spec.externalTrafficPolicy and this article that adds more detail, packets incoming to NodePort services with Service.spec.externalTrafficPolicy=Local set get routed to a kube-proxy, which then routes the packets to the corresponding pods its running.
This kube-proxy networking documentation further supports this theory adding that endpoints add a rule in the service's IPtable that forwards traffic to nodes through the kube-proxy.
Pods: services update their IPtables from endpoints, which contain the IP addresses for the pods they can route to. Furthermore, if you remove your service's label selectors and edit endpoints you can change where your traffic is routed to.
If one of these is right, then I must be misunderstanding something.
If services route to nodes, then why can I edit endpoints without breaking the IPtables?
If services route to pods, then why would services go through the trouble of routing to nodes when Service.spec.externalTrafficPolicy is set?
A Service is a virtual address/port managed by kube-proxy. Services forward traffic to their associated endpoints, which are usually pods but as you mentioned, can be set to any destination IP/Port.
A NodePort Service doesn't change the endpoint side of the service, the NodePort allows external traffic into Service via a port on a node.
Breakdown of a Service
kube-proxy can use 3 methods to implement the forwarding of a service from Node to destination.
a user proxy
iptables
ipvs
Most clusters use iptables, which is what is described below. I use the term "forward" instead of "route" because services use Network Address Translation (or the proxy) to "forward" traffic rather than standard network routing.
The service ClusterIP is a virtual entity managed by kube-proxy. This address/port combination is available on every node in the cluster and forwards any local (pod) service traffic to the endpoints IP and port.
/ Pod (remote node)
Pod -- ClusterIP/Port -- KUBE-SVC-NAT -- Pod
\ Pod (remote node)
A service with a NodePort is the same as above, with the addition of a way to forward external traffic into the cluster via a Node. kube-proxy manages an additional rule to watch for external traffic and forward it into the same service rules.
Ext -- NodePort \ / Pod (remote node)
KUBE-SVC-NAT -- Pod
Pod -- ClusterIP/Port / \ Pod (remote node)
The externalTrafficPolicy=Local setting makes a NodePort service use only a local Pod to service the incoming traffic. This avoids a network hop which removes the need to rewrite the source of the packet (via NAT). This results in the real network IP arriving at the pod servicing the connection, rather than one of the cluster nodes being the source IP.
Ext -- NodePort \ Pod (remote node)
KUBE-SVC-NAT -- Pod (local)
Pod -- ClusterIP/Port / Pod (remote node)
iptables
I recommend attempting to trace a connection from source to destination for a service or nodeport on a host. It requires a bit of iptables knowledge but I think it's worthwhile
To list all the services ip/ports that will be forwarded:
iptables -vnL -t nat KUBE-SERVICES
To list all the nodeports that will be forwarded:
iptables -vnL -t nat KUBE-NODEPORTS
Once you have the rule you can jump through KUBE-SVC-XXX "target" rules in the full output.
iptables -vnL -t nat | less
externalTrafficPolicy: Cluster will not used on ClusterIP, try to remove and apply it, it"ll work

what is the use of cluster IP in kubernetes

Can someone help me understand about the IP address I see for cluster IP when I list services.
what is cluster IP (not the service type, but the real IP)?
how it is used?
where does it come from?
can I define the range for cluster IP (like we do for pod network)?
Good question to start learning something new (also for me):
Your concerns are related to kube-proxy by default in K8s cluster it's working in iptables mode.
Every node in a Kubernetes cluster runs a kube-proxy. Kube-proxy is responsible for implementing a form of virtual IP for Services.
In this mode, kube-proxy watches the Kubernetes control plane for the addition and removal of Service and Endpoint objects. For each Service, it installs iptables rules, which capture traffic to the Service’s clusterIP and port, and redirect that traffic to one of the Service’s backend sets. For each Endpoint object, it installs iptables rules which select a backend Pod.
Node components kube-proxy:
kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it’s available. Otherwise, kube-proxy forwards the traffic itself.
As described here:
Due to these iptables rules, whenever a packet is destined for a service IP, it’s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpoints pod IP chosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.
When this DNAT happens, this info is stored in conntrack — the Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.
There are also different modes, you can find more information here
During cluster initialization you can use --service-cidr string parameter Default: "10.96.0.0/12"
ClusterIP: The IP address assigned to a Service
Kubernetes assigns a stable, reliable IP address to each newly-created Service (the ClusterIP) from the cluster's pool of available Service IP addresses. Kubernetes also assigns a hostname to the ClusterIP, by adding a DNS entry. The ClusterIP and hostname are unique within the cluster and do not change throughout the lifecycle of the Service. Kubernetes only releases the ClusterIP and hostname if the Service is deleted from the cluster's configuration. You can reach a healthy Pod running your application using either the ClusterIP or the hostname of the Service.
Pod IP: The IP address assigned to a given Pod.
Kubernetes assigns an IP address (the Pod IP) to the virtual network interface in the Pod's network namespace from a range of addresses reserved for Pods on the node. This address range is a subset of the IP address range assigned to the cluster for Pods, which you can configure when you create a cluster.
Resources:
Iptables Mode
Network overview
Understanding Kubernetes Kube-Proxy
Hope this helped
The cluster IP is the address where your service can be reached from inside the cluster. You won't be able to ping from the external network the cluster IP unless you do some kind of SSH tunneling. This IP is auto assigned by k8s and it might be possible to define a range (I'm not sure and I don't see why you need to do so).

How to provide for 2 different IP ranges? --pod-network-cidr= for multiple IP ranges

I have 2 different IP sets in the same network. My kubeadm is in a different IP range than my other nodes. How shall I set the property here: kubeadm init --pod-network-cidr=
cat /etc/hosts
#kubernetes slaves ebdp-ch2-d587p.sys.***.net 172.26.0.194, ebdp-ch2-d588p.sys.***.net 172.26.0.195
10.248.43.214 kubemaster
172.26.0.194 kube2
172.26.0.195 kube3
--pod-network-cidr is for IPs of the pods that kubernetes will manage. It is not related with nodes of the cluster.
For nodes, the requirement is (from Kubernetes doc):
Full network connectivity between all machines in the cluster (public
or private network is fine)
In addition to #Yavuz Sert answer, --pod-network-cidr flag identifies Container Network Interface (CNI) IP pool for Pods communication purpose within a Kubernetes cluster. You have to choose some separate IP subnet for Pod networking, it has to be different against your current given network sets. Since --pod-network-cidr has successfully applied kube-proxy reflects Pod IP subnet and add appropriate routes for network communication between Pods through cluster overlay network. Indeed you can find clusterCIDR flag withing kube-proxy configmap which corresponds to --pod-network-cidr.

Ip addressing of pods in Kubernetes

How does pods get unique IP addresses even if they reside in the same worker node?
Also pod is not a device what is logic behind having it an IP address?
Is the IP address assigned to a pod a virtual IP?
A pod is part of a cluster (group of nodes), and cluster networking tells you that:
In reality, Kubernetes applies IP addresses at the Pod scope - containers within a Pod share their network namespaces - including their IP address.
This means that containers within a Pod can all reach each other’s ports on localhost.
This does imply that containers within a Pod must coordinate port usage, but this is no different than processes in a VM.
This is called the “IP-per-pod” model.
The constraints are:
all containers can communicate with all other containers without NAT
all nodes can communicate with all containers (and vice-versa) without NAT
the IP that a container sees itself as is the same IP that others see it as
See more with "Networking with Kubernetes" from Alok Kumar Singh:
Here:
We have a machine, it is called a node in kubernetes.
It has an IP 172.31.102.105 belonging to a subnet having CIDR 172.31.102.0/24.
(CIDR: Classless Inter-Domain Routing, a method for allocating IP addresses and IP routing)
The node has an network interface eth0 attached. It belongs to root network namespace of the node.
For pods to be isolated, they were created in their own network namespaces — these are pod1 n/w ns and pod2 n/w ns.
The pods are assigned IP addresses 100.96.243.7 and 100.96.243.8 from the CIDR range 100.96.0.0/11.
For the, see "Kubernetes Networking" from CloudNativelabs:
Kubernetes does not orchestrate setting up the network and offloads the job to the CNI (Container Network Interface) plug-ins. Please refer to the CNI spec for further details on CNI specification.
Below are possible network implementation options through CNI plugins which permits pod-to-pod communication honoring the Kubernetes requirements:
layer 2 (switching) solution
layer 3 (routing) solution
overlay solutions
layer 2 (switching)
You can see their IP attributed as part of a container subnet address range.
layer 3 (routing)
This is about populating the default gateway router with routes for the subnet as shown in the diagram.
Routes to 10.1.1.0/24 and 10.1.2.0/24 are configured to be through node1 and node2 respectively.
overlay solutions
Generally not used.
Note: See also (Oct. 2018): "Google Kubernetes Engine networking".
Kubernetes creates a network within your network for the containers. in GKE, for example, by default it is a /14, but can be overwritten by a user with a range between /11 and /19.
When Kubernetes creates a pod, it assigns an IP address from these range. Now, you can't have another VM, not part of your cluster, in your network, with the same IP address that a pod has.
Why? Imagine, you have a VPN tunnel that needs to deliver a packet to an address that both, the pod and the VM are using. Who is it going to deliver to?
So, answering your question; no, it is not a virtual IP, it is a physical IP address from your network.