Worker node limit in Google Kubernetes Engine - kubernetes

I have a plan to use Kubernetes in Google Cloud as a production stage.
However, I found this document explaining that there is a limitation at 250 worker nodes in a group.
In a GKE cluster, an internal forwarding rule points to all the nodes in the cluster. Each node in the cluster is a backend VM for the ILB. The maximum number of backend VMs for an ILB is 250, regardless of how the VMs are associated with instance groups. So the maximum number of nodes in a GKE cluster with an ILB is 250. If you have autoscaling enabled for your cluster, you must ensure that autoscaling does not scale your cluster beyond 250 nodes.
Could you explain the above to me?
Should I create a second and third cluster, in case our worker nodes reach ~250.
I will be using the load balancer in the front of our services.

Related

Kubernetes relation between worker node IP address and Pod IP

I have two questions.
All the tutorials in the youtube says that, if the worker node internal IP is 10.10.1.0 then the pods inside the node will have internal IPs between 10.10.1.1 till 10.10.1 254. But in my Google Kubernetes Engine it is very different and I don't see any relation between them.
rc-server-1x769 ip is 10.0.0.8 but its corresponding node gke-kubia-default-pool-6f6eb62a-qv25 has 10.160.0.7
How to release the external ips assigned to my worker nodes.
For Q2:
GKE manages the VMs created in your cluster so if they go down or if there needs to be down/up scaling, VMs are created with the same characteristics. I do not believe what you are asking is possible (release). You will need to consider a private cluster.
Pod's CIDR and Cluster CIDR - it's different entities.
So Pod-Pod communication happens within Pod's CIDR, not within cluster CIDR.
Your nodes should have interfaces, which corresponds to your Pods CIDR. But from Cluster point of view, they have Cluster IP's. (kubectl output)

How does kubernetes help in reducing the cost of hosting?

I am trying to understand this hosting and scaling stuffs , say if i have a website with huge traffic on weekends which would require 2 vps at least to handle the load.
we could do either of the 2 things
we could simply upgrade to a larger vps plan and forget it, which is an inefficient way and also a costlier option.
Making 2 vps and setting up a load balancer and let it handle the traffic between 2 vps just like kubernetes does.
So how are kubernetes helpful then if we are still paying for 2nd vps?
Can kubernetes spin full vps before deploying news pods in it?
You can use Cluster Autoscaler for your Kubernetes cluster which will add or remove nodes on demands.
Kubernetes can run virtually anywhere - on bare metal as well as in a private or public cloud.
However, where you choose to run Kubernetes determines the scalability of your Kubernetes cluster.
Deploying Kubernetes on VPS servers requires more effort on your side and the cluster is less scalable compared to managed Kubernetes services such as: GKE, EKS and AKS.
In General, the Cluster Autoscaler is available primarily for managed Kubernetes Services (see: Supported cloud providers).
Cluster Autoscaler:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
there are pods that failed to run in the cluster due to insufficient resources.
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
For VPS, you can still use the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to optimize the resource utilization of your application.
Horizontal Pod Autoscaler:
The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
Vertical Pod Autoscaler:
The Vertical Pod Autoscaler automatically adjust the amount of CPU and memory requested by pods running in the Kubernetes Cluster.

Azure Kubernetes Cluster Autoscaler - set memory threshold for scaling out nodes

In my 1 node AKS, I deploy multiple job resources (kind:jobs) that are terminated after the task is completed. I have enabled Cluster Autoscaler to add a second node when too many jobs are consuming the first node memory, however it scales out after a job/pod is unable to be created due to lack of memory.
In my job yaml I also defined the resource memory limit and request.
Is there a possibility to configure the Cluster Autoscaler to scale out proactively when it reaches a certain memory threshold (e.g., 70% of the node memory) not just when it cannot deploy a job/pod?
In Kubernetes you can find 3 Autoscaling Mechanisms: Horizontal Pod Autoscaler, Vertical Pod Autoscaler which both can be controlled by metrics usage and Cluster Autoscaler.
As per Cluster Autoscaler Documentation:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
there are pods that failed to run in the cluster due to insufficient resources.
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
In AKS Cluster Autoscaler Documentation you can find note that CA is Kubernetes Component, not something AKS specific:
The cluster autoscaler is a Kubernetes component. Although the AKS cluster uses a virtual machine scale set for the nodes, don't manually enable or edit settings for scale set autoscale in the Azure portal or using the Azure CLI. Let the Kubernetes cluster autoscaler manage the required scale settings.
In Azure Documentation - About the cluster autoscaler you have information that AKS clusters can scale in one of two ways:
The cluster autoscaler watches for pods that can't be scheduled on nodes because of resource constraints. The cluster then automatically increases the number of nodes.
The horizontal pod autoscaler uses the Metrics Server in a Kubernetes cluster to monitor the resource demand of pods. If an application needs more resources, the number of pods is automatically increased to meet the demand.
On AKS you can adjust a bit your Autoscaler Profile to change some default values. More detail can be found in Using the autoscaler profile
I would suggest you to read the Understanding Kubernetes Cluster Autoscaling article which explains how CA works. Under Limitations part you have information:
The cluster autoscaler doesn’t take into account actual CPU/GPU/Memory usage, just resource requests and limits. Most teams overprovision at the pod level, so in practice we see aggressive upscaling and conservative downscaling.
Conclusion
Cluster Autoscaler doesn't consider actual resources usage. CA downscale or upscale might take a few minutes depending on cloud provider.

EKS provisioned LoadBalancers reference all nodes in the cluster. If the pods reside on 1 or 2 nodes is this efficient as the ELB traffic increases?

In Kubernetes (on AWS EKS) when I create a service of type LoadBalancer the resultant EC2 LoadBalancer is associated with all nodes (instances) in the EKS cluster even though the selector in the service will only find the pods running on 1 or 2 of these nodes (ie. a much smaller subset of nodes).
I am keen to understand is this will be efficient as the volume of traffic increases.
I could not find any advice on this topic and am keen to understand if this the correct approach.
This could introduce additional SNAT if the request arrives at the node which the pods is not running on and also does not preserve the source IP of the request. You can change externalTrafficPolicy to Local which only associates nodes have pods running to the LoadBalancers.
You can get more information from the following links.
Perserve source IP
EKS load balancer support
On EKS, if you are using AWS CNI, which is default for EKS, then you can use aws-alb-ingress-loadbalancer to create ELB & ALB.
While creating loadbalancer you can use below annotation, then traffic is only routed to your pods.
alb.ingress.kubernetes.io/target-type: ip
Reference:
https://github.com/aws/amazon-vpc-cni-k8s
https://github.com/kubernetes-sigs/aws-alb-ingress-controller
https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#target-type

Kubernetes Service IP entry in IP tables

Deployed pod using replication controller with replicas set to 3. Cluster has 5 nodes. Created a service (type nodeport) for the pod. Now kube-proxy adds entry about the service into ip-tables of all 5 nodes. Would it not be a overhead if there are 50 nodes in the cluster?
This is not an overhead. Every node needs to be able to communicate with services even if it does not host the pods of that service (ie. it may have pods that connect to that service).
That said, in some very large clusters it was reported that performance of iptables updates might be poor (mind that this is for a very, very big scale). If that is the case, you might prefer to look into solutions like Linkerd (https://linkerd.io/) or Istio (https://istio.io/)