kubernetes pods went down when iptables changed - kubernetes

When I modified the iptables on the host,the k8s pods went down. seems like communication within cluster blocked. pod status turned into ContainerCreating.
I just want to do a simple ip white list like below.
iptables -A INPUT -s 10.xxx.4.0/24 -p all -j ACCEPT
iptables -A INPUT -j REJECT
Then I delete the reject item in the iptables,pods went running again.
I just want to know how to do a simple ip white list on a host and Not affecting k8s pods running?
the events:
!

Kubernetes uses iptables to control the network connections between pods (and between nodes), handling many of the networking and port forwarding rules. With the iptables -A INPUT -j REJECT you are actually not allowing him to do that.
Taken from the understanding kubernestes networking model article:
In Kubernetes, iptables rules are configured by the kube-proxy controller that watches the Kubernetes API server for changes. When a change to a Service or Pod updates the virtual IP address of the Service or the IP address of a Pod, iptables rules are updated to correctly route traffic directed at a Service to a backing Pod. The iptables rules watch for traffic destined for a Service’s virtual IP and, on a match, a random Pod IP address is selected from the set of available Pods and the iptables rule changes the packet’s destination IP address from the Service’s virtual IP to the IP of the selected Pod. As Pods come up or down, the iptables ruleset is updated to reflect the changing state of the cluster. Put another way, iptables has done load-balancing on the machine to take traffic directed to a service’s IP to an actual pod’s IP.
To secure cluster its better to put all custom rules on the gateway(ADC) or into cloud security groups. Cluster security level can be handled via Network Policies, Ingress, RBAC and others.
Kubernetes has also a good article about Securing a Cluster and github guide with best practices about kubernestes security.

Related

How to modify source IP for a Pod in Kubernetes?

To change the source IP to 100.101.102.103 for outgoing data to a specific destination, I modified iptables inside the container of a Kubernetes Pod by executing iptables CLI tool:
iptables -t nat -A POSTROUTING --destination 100.200.150.50/32 -j SNAT --to-source 100.101.102.103
But it blocks my outgoing data to that destination and seems like the data is caught inside the container for example when I send a simple request by Curl and watch it by Tcpdump tool.
The main question can be: How to modify source IP for a Pod in Kubernetes for a destination outside the cluster?
P.S. I deployed my pod in privileged mode with NET_ADMIN and NET_RAW access.
I modified iptables inside the container of a Kubernetes Pod
I suggest not to do this as it may corrupt kubernetes' CNI and/or kube-proxy. Instead, consider using kubernetes egress to have a well-known source IP address in outgoing packets to a destination outside the cluster.
Egress packets from a k8s cluster to a destination outside the cluster have node's IP as the source IP.
https://kubernetes.io/docs/tutorials/services/source-ip/ says egress packets from k8s get source NAT'ed with node's IP:
Source NAT: replacing the source IP on a packet, usually with a node’s IP
Following can be used to send egress packets from a k8s cluster:
k8s network policies
calico CNI egress policies
istio egress gateway
nirmata/kube-static-egress-ip GitHub project

How do Kubernetes NodePort services with Service.spec.externalTrafficPolicy=Local route traffic?

There seems to be two contradictory explanations of how NodePort services route traffic. Services can route traffic to one of the two, not both:
Nodes (through the kube-proxy) According to kubectl explain Service.spec.externalTrafficPolicy and this article that adds more detail, packets incoming to NodePort services with Service.spec.externalTrafficPolicy=Local set get routed to a kube-proxy, which then routes the packets to the corresponding pods its running.
This kube-proxy networking documentation further supports this theory adding that endpoints add a rule in the service's IPtable that forwards traffic to nodes through the kube-proxy.
Pods: services update their IPtables from endpoints, which contain the IP addresses for the pods they can route to. Furthermore, if you remove your service's label selectors and edit endpoints you can change where your traffic is routed to.
If one of these is right, then I must be misunderstanding something.
If services route to nodes, then why can I edit endpoints without breaking the IPtables?
If services route to pods, then why would services go through the trouble of routing to nodes when Service.spec.externalTrafficPolicy is set?
A Service is a virtual address/port managed by kube-proxy. Services forward traffic to their associated endpoints, which are usually pods but as you mentioned, can be set to any destination IP/Port.
A NodePort Service doesn't change the endpoint side of the service, the NodePort allows external traffic into Service via a port on a node.
Breakdown of a Service
kube-proxy can use 3 methods to implement the forwarding of a service from Node to destination.
a user proxy
iptables
ipvs
Most clusters use iptables, which is what is described below. I use the term "forward" instead of "route" because services use Network Address Translation (or the proxy) to "forward" traffic rather than standard network routing.
The service ClusterIP is a virtual entity managed by kube-proxy. This address/port combination is available on every node in the cluster and forwards any local (pod) service traffic to the endpoints IP and port.
/ Pod (remote node)
Pod -- ClusterIP/Port -- KUBE-SVC-NAT -- Pod
\ Pod (remote node)
A service with a NodePort is the same as above, with the addition of a way to forward external traffic into the cluster via a Node. kube-proxy manages an additional rule to watch for external traffic and forward it into the same service rules.
Ext -- NodePort \ / Pod (remote node)
KUBE-SVC-NAT -- Pod
Pod -- ClusterIP/Port / \ Pod (remote node)
The externalTrafficPolicy=Local setting makes a NodePort service use only a local Pod to service the incoming traffic. This avoids a network hop which removes the need to rewrite the source of the packet (via NAT). This results in the real network IP arriving at the pod servicing the connection, rather than one of the cluster nodes being the source IP.
Ext -- NodePort \ Pod (remote node)
KUBE-SVC-NAT -- Pod (local)
Pod -- ClusterIP/Port / Pod (remote node)
iptables
I recommend attempting to trace a connection from source to destination for a service or nodeport on a host. It requires a bit of iptables knowledge but I think it's worthwhile
To list all the services ip/ports that will be forwarded:
iptables -vnL -t nat KUBE-SERVICES
To list all the nodeports that will be forwarded:
iptables -vnL -t nat KUBE-NODEPORTS
Once you have the rule you can jump through KUBE-SVC-XXX "target" rules in the full output.
iptables -vnL -t nat | less
externalTrafficPolicy: Cluster will not used on ClusterIP, try to remove and apply it, it"ll work

what is the use of cluster IP in kubernetes

Can someone help me understand about the IP address I see for cluster IP when I list services.
what is cluster IP (not the service type, but the real IP)?
how it is used?
where does it come from?
can I define the range for cluster IP (like we do for pod network)?
Good question to start learning something new (also for me):
Your concerns are related to kube-proxy by default in K8s cluster it's working in iptables mode.
Every node in a Kubernetes cluster runs a kube-proxy. Kube-proxy is responsible for implementing a form of virtual IP for Services.
In this mode, kube-proxy watches the Kubernetes control plane for the addition and removal of Service and Endpoint objects. For each Service, it installs iptables rules, which capture traffic to the Service’s clusterIP and port, and redirect that traffic to one of the Service’s backend sets. For each Endpoint object, it installs iptables rules which select a backend Pod.
Node components kube-proxy:
kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it’s available. Otherwise, kube-proxy forwards the traffic itself.
As described here:
Due to these iptables rules, whenever a packet is destined for a service IP, it’s DNATed (DNAT=Destination Network Address Translation), meaning the destination IP is changed from service IP to one of the endpoints pod IP chosen at random by iptables. This makes sure the load is evenly distributed among the backend pods.
When this DNAT happens, this info is stored in conntrack — the Linux connection tracking table (stores 5-tuple translations iptables has done: protocol, srcIP, srcPort, dstIP, dstPort). This is so that when a reply comes back, it can un-DNAT, meaning change the source IP from the Pod IP to the Service IP. This way, the client is unaware of how the packet flow is handled behind the scenes.
There are also different modes, you can find more information here
During cluster initialization you can use --service-cidr string parameter Default: "10.96.0.0/12"
ClusterIP: The IP address assigned to a Service
Kubernetes assigns a stable, reliable IP address to each newly-created Service (the ClusterIP) from the cluster's pool of available Service IP addresses. Kubernetes also assigns a hostname to the ClusterIP, by adding a DNS entry. The ClusterIP and hostname are unique within the cluster and do not change throughout the lifecycle of the Service. Kubernetes only releases the ClusterIP and hostname if the Service is deleted from the cluster's configuration. You can reach a healthy Pod running your application using either the ClusterIP or the hostname of the Service.
Pod IP: The IP address assigned to a given Pod.
Kubernetes assigns an IP address (the Pod IP) to the virtual network interface in the Pod's network namespace from a range of addresses reserved for Pods on the node. This address range is a subset of the IP address range assigned to the cluster for Pods, which you can configure when you create a cluster.
Resources:
Iptables Mode
Network overview
Understanding Kubernetes Kube-Proxy
Hope this helped
The cluster IP is the address where your service can be reached from inside the cluster. You won't be able to ping from the external network the cluster IP unless you do some kind of SSH tunneling. This IP is auto assigned by k8s and it might be possible to define a range (I'm not sure and I don't see why you need to do so).

Kubernetes Load balancing among pods

I made a deployment and scaled out to 2 replicas.
And I made a service to forward it.
I found that kube-proxy uses iptables for forwarding from Service to Pod.
But I am confused that which one is actually responsible for load-balancing.
Service or Kube-proxy?
Actually it's iptables that's responsible for load-balancing.
At the beginning, you setup a service. At the same time, kube-proxy is watching at the apiserver to get the new service, and setup iptables rules. last, when somebody try to access the service ip, iptables fowarding the request to actual pod ip according to the rules.

Kubernetes iptables - should the master node be running pods?

In my kubernetes cluster I have a flannel overlay configured. the iptables on the master node and the minions are not the same. I understand that this is by design but it creates the problem that pods can't run on the master node: on the master, external IP addresses (for a service) are not resolved.
On the minions there is a KUBE-PORTALS-CONTAINER and KUBE-PORTALS-HOST chain which redirects service cluster and external IP addresses. Static routes redirect this traffic to the master which actually resolves the addresses.
On the master there isn't the 2 chains mentioned above but instead there is a KUBE-SERVICES chain which resolves services IPs to pod IP addresses.
Is there a way to configure the master node to have the chains that the nodes have as well as the service resolution chain too?
Pods are running on Kubernetes nodes (formerly called minions) but not on masters, see Kubernetes architecture. You may also find the debugging Service doc helpful.