GKE autopilot with shared vpc ip exhausted - kubernetes

I have setup a new subnet in my shared VPC for GKE autopilot as the following:
node ip: 10.11.1.0/24
first secondary ip: 10.11.2.0/24
second secondary ip: 10.11.3.0/24
I tried to test it by running simple nginx images with 30 replicas.
based on my understanding:
I have 256 possible node IP
I have 256 possible pod IP
I have 256 possible service IP
after deploying, somehow my k8s are stuck with only 2 pods deployed and running. the rest is just in pending state with error code:IP_SPACE_EXHAUSTED
my question is how come? I still have plenty IP address, this is fresh deployed kubernetes cluster.

Autopilot sets "max pods per node" to 32. This results in a /26 (64 IP addresses) being assigned to each Autopilot node from the Pod secondary IP range. Since your Pod range is a /24, this means your Autopilot cluster can support a max of 4 nodes.
By default, Autopilot clusters start with 2 nodes (one runs some system stuff). Looks like your pods did not fit on either of these nodes, so Autopilot provisioned new nodes as required. Generally, Autopilot tries to find the best fit node sizes for your deployments and in this case looks like you ended up with a pod per node.
I'd recommend a /17 or a /16 for your Pod range to maximize the number of nodes.

Pod CIDR ranges in Autopilot clusters
The default settings for Autopilot cluster CIDR sizes are as follows:
Subnetwork range: /23
Secondary IP address range for Pods: /17
Secondary IP address range for Services: /22
Autopilot has a maximum Pods per node of 32, you may check this link.
Autopilot cluster maximum number of nodes is pre-configured and immutable, you may check this link.

Related

How kubernetes assigns podCIDR for nodes?

I'm currently learning about Kubernetes networking.
What I've got so far, is that we have CNI plugins which takes care of handling network connectivity for pods - they create network interfaces inside a network namespace when a pod is created, they set up routes for the pod, etc.
So basically kubernetes delegates some network-related tasks to the CNI plugins.
But I suppose there is some portion of networking tasks that kubernetes does by itself. For example - kubernetes assigns to each node a podCIDR.
For example, I've set up a kubernetes cluster using kubeadm, with the command:
kubeadm init --pod-network-cidr=192.168.0.0/16 --kubernetes-version=1.24.0
And when I then look at the nodes I see that each received its own podCIDR range, for example:
spec:
podCIDR: 192.168.2.0/24
podCIDRs:
- 192.168.2.0/24
My question is: How does kubernetes calculate CIDR ranges for the nodes? Does it always assign a /24 subnet for each node?
When you configure the maximum number of Pods per node for the cluster, Kubernetes uses this value to allocate a CIDR range for the nodes. You can calculate the maximum number of nodes on the cluster based on the cluster's secondary IP address range for Pods and the allocated CIDR range for the node.
Kubernetes assigns each node a range of IP addresses, a CIDR block, so that each Pod can have a unique IP address. The size of the CIDR block corresponds to the maximum number of Pods per node.
Also please refer to the similar SO & CIDR ranges for more information.

Kubernetes relation between worker node IP address and Pod IP

I have two questions.
All the tutorials in the youtube says that, if the worker node internal IP is 10.10.1.0 then the pods inside the node will have internal IPs between 10.10.1.1 till 10.10.1 254. But in my Google Kubernetes Engine it is very different and I don't see any relation between them.
rc-server-1x769 ip is 10.0.0.8 but its corresponding node gke-kubia-default-pool-6f6eb62a-qv25 has 10.160.0.7
How to release the external ips assigned to my worker nodes.
For Q2:
GKE manages the VMs created in your cluster so if they go down or if there needs to be down/up scaling, VMs are created with the same characteristics. I do not believe what you are asking is possible (release). You will need to consider a private cluster.
Pod's CIDR and Cluster CIDR - it's different entities.
So Pod-Pod communication happens within Pod's CIDR, not within cluster CIDR.
Your nodes should have interfaces, which corresponds to your Pods CIDR. But from Cluster point of view, they have Cluster IP's. (kubectl output)

EKS provisioned LoadBalancers reference all nodes in the cluster. If the pods reside on 1 or 2 nodes is this efficient as the ELB traffic increases?

In Kubernetes (on AWS EKS) when I create a service of type LoadBalancer the resultant EC2 LoadBalancer is associated with all nodes (instances) in the EKS cluster even though the selector in the service will only find the pods running on 1 or 2 of these nodes (ie. a much smaller subset of nodes).
I am keen to understand is this will be efficient as the volume of traffic increases.
I could not find any advice on this topic and am keen to understand if this the correct approach.
This could introduce additional SNAT if the request arrives at the node which the pods is not running on and also does not preserve the source IP of the request. You can change externalTrafficPolicy to Local which only associates nodes have pods running to the LoadBalancers.
You can get more information from the following links.
Perserve source IP
EKS load balancer support
On EKS, if you are using AWS CNI, which is default for EKS, then you can use aws-alb-ingress-loadbalancer to create ELB & ALB.
While creating loadbalancer you can use below annotation, then traffic is only routed to your pods.
alb.ingress.kubernetes.io/target-type: ip
Reference:
https://github.com/aws/amazon-vpc-cni-k8s
https://github.com/kubernetes-sigs/aws-alb-ingress-controller
https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#target-type

Kubernetes: Same node but different IPs

I have created a K8s cluster on GCP, and I deployed an application.
Then I scaled it:
kubectl scale deployment hello-world-rest-api --replicas=3
Now when I run 'kubectl get pods', I see three pods. Their NODE value is same. I understand it means they all are deployed on same machine. But I observe that IP value for all three is different.
If NODE is same, then why is IP different?
There are several networks in a k8s cluster. The pods are on the pod network, so every pod deployed on the nodes of a k8s cluster can see each other as though they are independent nodes on a network. The pod address space is different from the node address space. So, each pod running on a node gets a unique address from the pod network, which is also different from the node network. The k8s components running on each node perform the address translation.

limitation of flannel on kubernetes

I using flannel on kubernetes.
On very node, there is a flannel interface and cni interface.
I.E, If I use 10.244.0.0 as subnet, Then
flannel 10.244.3.0
cni 10.244.3.1
They almost always come as a pair like above.
The quest is, If I use flannel, The number of nodes should be less equal than 255 ? 10.244.1~255.0
That is I can only manage 255 nodes on kubernetes with flannel ???
Flannels network range is changeable in its net-conf.json, see the recommended kubernetes deployment of flannel 0.8.0 for clarification. The actual subnet given to node the is set on node join by the Kubernetes node controller and fetched by flannel via the Kubernetes api server on startup before network creation when the --kube-subnet-mgr option of the flannel daemon is set.
I am not familiar with the implementation of the Kubernetes node controller, I suspect it would assign smaller subnets to the nodes if the third octet of the CIDR is exhausted. If you want to be absolutely sure, set your flannel network to something like 10.0.0.0/8, depending on number of nodes and pods.