Why there is no concept of nodepool in Kubernetes? - kubernetes

I can see GKE, AKS, EKS all are having nodepool concepts inbuilt but Kubernetes itself doesn't provide that support. What could be the reason behind this?
We usually need different Node types for different requirements such as below-
Some pods require either CPU or Memory intensive and optimized nodes.
Some pods are processing ML/AI algorithms and need GPU-enabled nodes. These GPU-enabled nodes should be used only by certain pods as they are expensive.
Some pods/jobs want to leverage spot/preemptible nodes to reduce the cost.
Is there any specific reason behind Kubernetes not having inbuilt such support?

Node Pools are cloud-provider specific technologies/groupings.
Kubernetes is intended to be deployed on various infrastructures, including on-prem/bare metal. Node Pools would not mean anything in this case.
Node Pools generally are a way to provide Kubernetes with a group of identically configured nodes to use in the cluster.
You would specify the node you want using node selectors and/or taints/tolerations.
So you could taint nodes with a GPU and then require pods to have the matching toleration in order to schedule onto those nodes. Node Pools wouldn't make a difference here. You could join a physical server to the cluster and taint that node in exactly the same way -- Kubernetes would not see that any differently to a Google, Amazon or Azure-based node that was also registered to the cluster, other than some different annotations on the node.

As Blender Fox mentioned Node group is more specific to Cloud provider Grouping/Target options.
In AWS we have Node groups or Target groups, While in GKE Managed/Unmanaged node groups.
You set the Cluster Autoscaler and it scales up & down the count in the Node pool or Node groups.
If you are running Kubernetes on On-prem there may not be the option of a Node pool, as the Node group is mostly a group of VM in the Cloud. While on the on-prem bare metal machines also work as Worker Nodes.
To scale up & Down there is Cluster autoscaler(CA adds or removes nodes from the cluster by creating/deleting VMs) in K8s which uses the Cloud provider node group API while on Bare metal it may not work simply.
Each provider have own implementation and logic which get determined from K8s side by flag --cloud-provider Code link
So if you are on On-prem private cloud write your own cloud client and interface.
It's not necessary to have to node group however it's more of Cloud provider side implementation.
For Scenario
Some pods require either CPU or Memory intensive and optimized nodes.
Some pods are processing ML/AI algorithms and need GPU-enabled nodes.
These GPU-enabled nodes should be used only by certain pods as they
are expensive. Some pods/jobs want to leverage spot/preemptible nodes
to reduce the cost.
You can use the Taints-toleration, Affinity, or Node selectors as per need to schedule the POD on the specific type of Nodes.

Related

In Kubernetes / AKS how do you direct services to specific node pools' individual nodes. Is there a way for the deployments to choose or is it auto?

I am new to Kubernetes. One thing I am not sure of is when creating all of these deployments I am starting to max out my cpu/memory usage on my current node pool.
In this article it states that the SAME configuration of nodes will be created as a "node pool"
In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools.
System node pools serve the primary purpose of hosting critical system pods such as CoreDNS and tunnelfront. User node pools serve the primary purpose of hosting your application pods.
Also, you can create "taints (lol what)", tags and labels which only seem to "label" per se the node pool. Not an individual node.
When creating a node pool, you can add taints, labels, or tags to that node pool. When you add a taint, label, or tag, all nodes within that node pool also get that taint, label, or tag.
So with all of that said, it doesn't seem like the control is inside of a node pool's node. So how does it work for nodes in a node pool when deploying workloads and services?
Do I need to worry about managing that or is that automatically managed and pods are created across the plane of nodes in a node pool. I'm not really seeing the documentation for this.
One more thing, "Vertical Pod Autoscaling"
This seems like a good option but doesn't really explain in the documentation what is going on in terms of the nodes in a node pool. Except for this one statement at the end.
This article showed you how to automatically scale resource utilization, such as CPU and memory, of cluster nodes to match application requirements.
The question about the vertical auto scaler (which is really vertical + horizontal (IMHO)) but I understand the reference/verbiage, is what happens if you aren't using this? Do you have to manage each deployment on your own? How do deployments distribute over the individual node pool plane?

AWS EKS Communication Between Node Groups

I have an EKS cluster with a single node group (3 nodes) that is currently only running Jenkins4.
I want to start utilising this EKS cluster for other things but want to separate out deployments into specific node groups.
For example, I want to create a 'monitoring' node group to which I will deploy prometheus and grafana. I also want another larger node group for application deployments.
I know I can create a second node group in EKS and label it with 'monitoring' so I can use nodeSelector to deploy to the correct node group.
My question is around whether I need to consider networking between the node groups. For prometheus for example to be able to scrape from exporters running on pods on the other node groups.
Is that something which is required with some sort of ingress rule? Or is it not required. If it required, what is the correct way to implement this?
As long as the nodes are in the same cluster and belong to the same master and no custom network policy prevents node groups from reaching each other you should be able to rely on ClusterIPs.
My concern is more related on the reason why you should prefer to use dedicated node groups for separating tasks. Is that because of specific requirements? As long as you have available resources in your cluster I would leverage on the existing nodes and deploy Kubernetes Resources (deployments/services/etc..) in dedicated namespaces which is the kind of separation looks appropriate the most to me in your case. Then, at the time you need more horsepower, you can scale horizontally your cluster even with different hardware, specific labels and NodeAffinity (instead of NodeSelector, for better customisation).
Hope I helped.

K8s fault tolerance

I was going through the differences between Swarm vs K8s one of the cons of Swarm is that it has limited fault tolerance functionality. How does K8s achieve fault tolerance, is it via K8s multi-master. Please share your inputs
Yes! In order to achieve Kubernetes fault-tolerance is recommended to have multiples Control Planes (master) nodes and if you are running in cloud providers multiples availability zones are recommended.
The Control Plane’s components make global decisions about the cluster (for example, scheduling), as well as detecting and responding to cluster events (for example, starting up a new pod when a deployment’s replicas field is unsatisfied).
Basically, the control plane is composed by theses components:
kube-apiserver - Exposes the Kubernetes API. Is the front for Kubernetes control plane
etcd - Key/Value Kubernetes' backing store for cluster data
kube-scheduler - Responsible for watches for newly created pods with no assigned node, and selects a node for them to run on
kube-controller-manager - One of controller responsibility is maintain the correct number of pods for every replication controller, populate endpoints objects and responding when nodes go down.
cloud-controller-manager - Interact with the underlying cloud providers,
Every cluster need 1 worker nodes at least, the work nodes is responsible to run your workloads.
Here’s the diagram of a Kubernetes cluster with all the components tied together:
For more info see here
Yes, all kubernetes control plane components are either clustered (etcd), run a leader election (controllers), or flat (apiserver). Traditionally you run three control plane nodes but you can do 5 in some complex topologies.

Preemptive Cloud Run on GKE

Is it possible to create a Cloud Run on GKE (Anthos) Kubernetes Cluster with Preemptible nodes and if so can you also enable plugins such as gke-node-pool-shifter and gke-pvm-killer or will it interfere with cloud run actions such as autoscaling pods
https://hub.helm.sh/charts/rimusz/gke-node-pool-shifter
https://hub.helm.sh/charts/rimusz/gke-pvm-killer
Technically a Cloud Run on GKE cluster is still a GKE cluster at the end of the day, so it can have preemptive node pools.
However, some Knative Serving components, such as the activator and autoscaler are in the hot path of serving the requests. You need to make sure they don't end up in a preemptible pool. Similarly, the controller and webhook are somewhat central to the control plane lifecycle of Knative API objects, so you also need to make sure these pods end up in a non-preemptible node pool.
Secondly, Knative (for now) does not support node selectors or taints/tolerations: https://knative.tips/pod-config/node-affinity/ It simply doesn't give you a way to specify nodeSelector or other affinity fields in the Pod template of Knative Service object.
Therefore, you gotta find out a way (like implementing your mutating admission webhook for Knative-created pods) to add such node selectors to the Pods, which is quite tedious.
However, by combining node taints and pd tolerations, I think you can have Knative system components end up in a non-preemptible pool, and everything else (i.e. Knative-created pods) in other nodes (i.e. preemptible nodes).

What's the maximum number of Kubernetes namespaces?

Is there a maximum number of namespaces supported by a Kubernetes cluster? My team is designing a system to run user workloads via K8s and we are considering using one namespace per user to offer logical segmentation in the cluster, but we don't want to hit a ceiling with the number of users who can use our service.
We are using Amazon's EKS managed Kubernetes service and Kubernetes v1.11.
This is quite difficult to answer which has dependency on a lot of factors, Here are some facts which were created on the k8s 1.7 cluster kubernetes-theresholds the Number of namespaces (ns) are 10000 with few assumtions
The are no limits from the code point of view because is just a Go type that gets instantiated as a variable.
In addition to link that #SureshVishnoi posted, the limits will depend on your setup but some of the factors that can contribute to how your namespaces (and resources in a cluster) scale can be:
Physical or VM hardware size where your masters are running
Unfortunately, EKS doesn't provide that yet (it's a managed service after all)
The number of nodes your cluster is handling.
The number of pods in each namespace
The number of overall K8s resources (deployments, secrets, service accounts, etc)
The hardware size of your etcd database.
Storage: how many resources can you persist.
Raw performance: how much memory and CPU you have.
The network connectivity between your master components and etcd store if they are on different nodes.
If they are on the same nodes then you are bound by the server's memory, CPU and storage.
There is no limit on number of namespaces. You can create as many as you want. It doesn't actually consume cluster resources like cpu, memory etc.