Which component in Kubernetes is responsible for resource allocation? - kubernetes

After scheduling the pods in a node in Kubernetes, which component is responsible for sharing resources among the pods in that node?

From https://kubernetes.io/docs/concepts/overview/components :
kube-scheduler - Component on the master that watches newly created pods
that have no node assigned, and selects a node for them to run on.
Factors taken into account for scheduling decisions include individual
and collective resource requirements, hardware/software/policy
constraints, affinity and anti-affinity specifications, data locality,
inter-workload interference and deadlines.
After pod is scheduled node's kubelet is responsible for dealing with pod's requests and limits. Depending on pod's quality of service and node resource pressure pod can be evicted or restarted by kubelet.

After scheduling
That will be the OS kernel.
You can reserve/limit pod resources: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits.
Than it is passed from kubelet down to docker, then to cgroups, and finally to a kernel.

Related

Controlling pods kubelet vs. controller in control plane

I'm a little confused, I've been ramping up on Kubernetes and I've been reading about all the different objects ReplicaSet, Deployment, Service, Pods etc.
In the documentation it mentions that the kubelet manages liveness and readiness checks which are defined in our ReplicaSet manifests.
Reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
If this is the case does the kubelet also manage the replicas? Or does that stay with the controller?
Or do I have it all wrong and it's the kubelet that is creating and managing all these resources on a pod?
Thanks in advance.
Basically kubelet is called "node agent" that runs on each node. It get notified through kube apiserver, then it start the container through container runtime, it works in terms of Pod Spec. It ensures the containers described in the Pod Specs are running and healthy.
The flow of kubelet tasks is like: kube apiserver <--> kubelet <--> CRI
To ensure whether the pod is running healthy it uses liveness probe, if it gets an error it restarts the pod.
kubelet does not maintain replicas, replicas are maintained by replicaset. As k8s doc said: A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
See more of ReplicaSet
For more info you can see: kubelet
When starting your journey with Kubernetes it is important to understand its main components for both Control Planes and Worker Nodes.
Based on your question we will focus on two of them:
kube-controller-manager:
Logically, each controller is a separate process, but to reduce
complexity, they are all compiled into a single binary and run in a
single process.
Some types of these controllers are:
Node controller: Responsible for noticing and responding when nodes go down.
Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
kubelet:
An agent that runs on each node in the cluster. It makes sure that
containers are running in a Pod.
The kubelet takes a set of PodSpecs that are provided through various
mechanisms and ensures that the containers described in those PodSpecs
are running and healthy. The kubelet doesn't manage containers which
were not created by Kubernetes.
So answering your question:
If this is the case does the kubelet also manage the replicas? Or does
that stay with the controller?
No, replication can be managed by the Replication Controller, a ReplicaSet or a more recommended Deployment. Kubelet runs on Nodes and makes sure that the Pods are running according too their PodSpecs.
You can find synopsis for kubelet and kube-controller-manager in the linked docs.
EDIT:
There is one exception however in a form of Static Pods:
Static Pods are managed directly by the kubelet daemon on a specific
node, without the API server observing them. Unlike Pods that are
managed by the control plane (for example, a Deployment); instead, the
kubelet watches each static Pod (and restarts it if it fails).
Note that it does not apply to multiple replicas.

What will happen when a node is almost out of resource, when deploying K8s daemonset?

When deploying Kubernetes Daemonset, what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted? Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
DaemonSet is a workload type that is mostly for operations workload e.g. transporting logs from the node or similar "system services". It is rarely a good fit for workload that is serving your users, but it can be.
what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted?
As I described above, workload deployed with DaemonSet is typically operations workload that has e.g. an infrastructure role in your cluster. Since this may be more critical pods (or less, depending on what you want), I would use a higher Quality of Service for these pods, so that other pods is evicted when there are few resources on the node.
See Configure Quality of Service for Pods for how to configure your Pods to be in a Quality of Service class, one of:
Guaranteed
Burstable
Best Effort
You might also consider to use Pod Priority and Preemption
The question was about DaemonSet but as a final note: Workload that serves requests from your users, typically is deployed as Deployment and for those, it is very easy to do horizontal scaling using Horizontal Pod Autoscaler.

How to force Eviction on a Kubernetes Cluster (minikube)

I am relatively new to Kubernetes, and have a current task to debug Eviction pods on my work.
I'm trying to replicate the behaviour on a local k8s cluster in minikube.
So far I just cannot get evicted pods happening.
Can you help me trigger this mechanism?
the eviction of pods is managed by the qos classes (quality of pods)
there are 3 categories
Guaranteed (limit = request cpu or ram) not evictable
Burstable
BestEffort
if you want test this mechanism , you can scale a pod that consume lot of memory or cpu and before that launch your example pods with différent request and limit for test this behavior. this behavior is only avaible for eviction so your pods must be already started before a cpu load.
after if you test a scheduling mechanism during a luanch time you can configure a priorityclassname for schedule a pods even if the cluster is full.
by example if your cluster is full you can't schedule a new pods because your pod don't have a sufficient privilege.
if you want schedule anyway a pod despite that you can add a priorityclassname system-node-critical or create your own priorityclass and one of the pod with a lower priority will be evict and your pod will be launched

Kubernetes (connection)-drain node with local persistent storage

We use local persistent storage as storage backend for SOLR pods. The pods are redundantly scheduled to multiple kubernetes nodes. If one of the nodes go down there are always enough instances on other nodes.
How can we drain these nodes (without "migrating" the SOLR pods to other nodes) in case we want to do a maintenance on a node? The most important thing for us would be that kube-proxy would no longer send new requests to the pods on the node in question so that after some time we could do the maintenance without interrupting service for running requests.
We tried cordon but cordon will only make sure no new pods are scheduled to a node. Drain does not seem to work with pods with local persistent volumes.
You can check out pod anti-affinity.
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
These constructs allow you to repel or attract pods when certain conditions are met.
In your case the pod anti-affinity 'requiredDuringSchedulingIgnoredDuringExecution' maybe your best bet. I haven't personally used it yet, i hope it can lead you to the right direction.

How to migrate the pods automatically to another node in kubernetes?

I am a new cookie to kubernetes . I am wondering if kubernetes have automatically switch the pods to another node if that node resources are on critical.
For example if Pod A , Pod B , Pod C is running on Node A and Pod D is running on Node B. The resources of Node A used by pods would be high. In these case whether kubernetes will migrate the any of the pods running in Node A to Node B.
I have learnt about node affinity and node selector which is used to run the pods in certain nodes. It would be helpfull if kubernetes offer this feature to migrate the pods to another node automatically if resources are used highly.
Can any one know how can we achieve this in kubernetes ?
Thanks
-S
Yes, Kubernetes can migrate the pods to another node automatically if resources are used highly. The pod would be killed and a new pod would be started on another node. You would probably want to learn about Quality of Service Classes, to understand which pod would be killed first.
That said, you may want to read about Automatic Horizontal Pod Autoscaling. This may give you more control.
With Horizontal Pod Autoscaling, Kubernetes automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with alpha support, on some other, application-provided metrics).
With increase of load it makes more sense to spin up a new pod rather than moving pod between different nodes to avoid distraction of currently running processes inside pod on busy node.
you can do node selector in deployment and move the node
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/