Controlling pods kubelet vs. controller in control plane - kubernetes

I'm a little confused, I've been ramping up on Kubernetes and I've been reading about all the different objects ReplicaSet, Deployment, Service, Pods etc.
In the documentation it mentions that the kubelet manages liveness and readiness checks which are defined in our ReplicaSet manifests.
Reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
If this is the case does the kubelet also manage the replicas? Or does that stay with the controller?
Or do I have it all wrong and it's the kubelet that is creating and managing all these resources on a pod?
Thanks in advance.

Basically kubelet is called "node agent" that runs on each node. It get notified through kube apiserver, then it start the container through container runtime, it works in terms of Pod Spec. It ensures the containers described in the Pod Specs are running and healthy.
The flow of kubelet tasks is like: kube apiserver <--> kubelet <--> CRI
To ensure whether the pod is running healthy it uses liveness probe, if it gets an error it restarts the pod.
kubelet does not maintain replicas, replicas are maintained by replicaset. As k8s doc said: A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
See more of ReplicaSet
For more info you can see: kubelet

When starting your journey with Kubernetes it is important to understand its main components for both Control Planes and Worker Nodes.
Based on your question we will focus on two of them:
kube-controller-manager:
Logically, each controller is a separate process, but to reduce
complexity, they are all compiled into a single binary and run in a
single process.
Some types of these controllers are:
Node controller: Responsible for noticing and responding when nodes go down.
Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
kubelet:
An agent that runs on each node in the cluster. It makes sure that
containers are running in a Pod.
The kubelet takes a set of PodSpecs that are provided through various
mechanisms and ensures that the containers described in those PodSpecs
are running and healthy. The kubelet doesn't manage containers which
were not created by Kubernetes.
So answering your question:
If this is the case does the kubelet also manage the replicas? Or does
that stay with the controller?
No, replication can be managed by the Replication Controller, a ReplicaSet or a more recommended Deployment. Kubelet runs on Nodes and makes sure that the Pods are running according too their PodSpecs.
You can find synopsis for kubelet and kube-controller-manager in the linked docs.
EDIT:
There is one exception however in a form of Static Pods:
Static Pods are managed directly by the kubelet daemon on a specific
node, without the API server observing them. Unlike Pods that are
managed by the control plane (for example, a Deployment); instead, the
kubelet watches each static Pod (and restarts it if it fails).
Note that it does not apply to multiple replicas.

Related

Are Kubernetes liveness probe failures voluntary or involuntary disruptions?

I have an application deployed to Kubernetes that depends on an outside application. Sometimes the connection between these 2 goes to an invalid state, and that can only be fixed by restarting my application.
To do automatic restarts, I have configured a liveness probe that will verify the connection.
This has been working great, however, I'm afraid that if that outside application goes down (such that the connection error isn't just due to an invalid pod state), all of my pods will immediately restart, and my application will become completely unavailable. I want it to remain running so that functionality not depending on the bad service can continue.
I'm wondering if a pod disruption budget would prevent this scenario, as it limits the # of pods down due to a "voluntary" disruption. However, the K8s docs don't state whether liveness probe failure are a voluntary disruption. Are they?
I would say, accordingly to the documentation:
Voluntary and involuntary disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
a hardware failure of the physical machine backing the node
cluster administrator deletes VM (instance) by mistake
cloud provider or hypervisor failure makes VM disappear
a kernel panic
the node disappears from the cluster due to cluster network partition
eviction of a pod due to the node being out-of-resources.
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
deleting the deployment or other controller that manages the pod
updating a deployment's pod template causing a restart
directly deleting a pod (e.g. by accident)
Cluster administrator actions include:
Draining a node for repair or upgrade.
Draining a node from a cluster to scale the cluster down (learn about Cluster Autoscaling ).
Removing a pod from a node to permit something else to fit on that node.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Disruptions
So your example is quite different and according to my knowledge it's neither voluntary or involuntary disruption.
Also taking a look on another Kubernetes documentation:
Pod lifetime
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to nodes where they remain until termination (according to restart policy) or deletion. If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period.
Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Pod lifetime
Container probes
The kubelet can optionally perform and react to three kinds of probes on running containers (focusing on a livenessProbe):
livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: Container probes
When should you use a liveness probe?
If the process in your container is able to crash on its own whenever it encounters an issue or becomes unhealthy, you do not necessarily need a liveness probe; the kubelet will automatically perform the correct action in accordance with the Pod's restartPolicy.
If you'd like your container to be killed and restarted if a probe fails, then specify a liveness probe, and specify a restartPolicy of Always or OnFailure.
-- Kubernetes.io: Docs: Concepts: Workloads: Pods: Pod lifecycle: When should you use a startup probe
According to those information it would be better to create custom liveness probe which should consider internal process health checks and external dependency(liveness) health check. In the first scenario your container should stop/terminate your process unlike the the second case with external dependency.
Answering following question:
I'm wondering if a pod disruption budget would prevent this scenario.
In this particular scenario PDB will not help.
I'd reckon giving more visibility to the comment, I've made with additional resources on the matter could prove useful to other community members:
Blog.risingstack.com: Designing microservices architecture for failure
Loft.sh: Blog: Kubernetes readiness probles examples common pitfalls: External depenedencies
Cloud.google.com: Archiecture: Scalable and resilient apps: Resilience designing to withstand failures
Testing with PodDisruptionBudget.
Pod will still restart at the same time.
example
https://github.com/AlphaWong/PodDisruptionBudgetAndPodProbe
So yes. like #Dawid Kruk u should create a customized script like following
# something like this
livenessProbe:
exec:
command:
- /bin/sh
- -c
# generate a random number for sleep
- 'SLEEP_TIME=$(shuf -i 2-40 -n 1);sleep $SLEEP_TIME; curl -L --max-time 5 -f nginx2.default.svc.cluster.local'
initialDelaySeconds: 10
# think about the gap between each call
periodSeconds: 30
# it is required after k8s v1.12
timeoutSeconds: 90
I'm wondering if a pod disruption budget would prevent this scenario.
Yes, it will prevent.
As you stated, when the pod goes down (or node failure) nothing can do pods from becoming unavailable. However, Certain services require that a minimum number of pods always keep running always.
There could be another way (Stateful resource) but it’s one of the simplest Kubernetes resources available.
Note: You can also use a percentage instead of an absolute number in the minAvailable field. For example, you could state that 60% of all pods with the app=run-always label need to be running at all times.

Kubernetes node without master

Cluster consists of one master and one worker node. If the master is down and worker is restarted no workloads (deployments) are started on boot. How and if it is possible to make worker resume last state without master?
Kubernetes 1.18.3
On worker node are installed: kubelet, kubectl, kubeadm
Ideally you should have more than one(typically a odd number like 3 or 5) node serving as master and accessible from worker nodes via a LoadBalancer.
The state is stored in ETCD which is accessed by worker nodes via the API Server. So without master nodes running there is no way for workers to know the desired state.
Although it's not recommended you but can use static pod as potential solution here.Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them.Unlike Pods that are managed by the control plane (for example, a Deployment ), instead the kubelet watches each static Pod (and restarts it if it crashes).
The caveat of using static pod is since those pods are not dependent on API Server Hence static Pods cannot be managed with kubectl or other Kubernetes API clients.

Which component in Kubernetes is responsible for resource allocation?

After scheduling the pods in a node in Kubernetes, which component is responsible for sharing resources among the pods in that node?
From https://kubernetes.io/docs/concepts/overview/components :
kube-scheduler - Component on the master that watches newly created pods
that have no node assigned, and selects a node for them to run on.
Factors taken into account for scheduling decisions include individual
and collective resource requirements, hardware/software/policy
constraints, affinity and anti-affinity specifications, data locality,
inter-workload interference and deadlines.
After pod is scheduled node's kubelet is responsible for dealing with pod's requests and limits. Depending on pod's quality of service and node resource pressure pod can be evicted or restarted by kubelet.
After scheduling
That will be the OS kernel.
You can reserve/limit pod resources: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits.
Than it is passed from kubelet down to docker, then to cgroups, and finally to a kernel.

Difference between daemonsets and deployments

In Kelsey Hightower's Kubernetes Up and Running, he gives two commands :
kubectl get daemonSets --namespace=kube-system kube-proxy
and
kubectl get deployments --namespace=kube-system kube-dns
Why does one use daemonSets and the other deployments?
And what's the difference?
Kubernetes deployments manage stateless services running on your cluster (as opposed to for example StatefulSets which manage stateful services). Their purpose is to keep a set of identical pods running and upgrade them in a controlled way. For example, you define how many replicas(pods) of your app you want to run in the deployment definition and kubernetes will make that many replicas of your application spread over nodes. If you say 5 replica's over 3 nodes, then some nodes will have more than one replica of your app running.
DaemonSets manage groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes. A Daemonset will not run more than one replica per node. Another advantage of using a Daemonset is that, if you add a node to the cluster, then the Daemonset will automatically spawn a pod on that node, which a deployment will not do.
DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes, and which do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd
Lets take the example you mentioned in your question: why iskube-dns a deployment andkube-proxy a daemonset?
The reason behind that is that kube-proxy is needed on every node in the cluster to run IP tables, so that every node can access every pod no matter on which node it resides. Hence, when we make kube-proxy a daemonset and another node is added to the cluster at a later time, kube-proxy is automatically spawned on that node.
Kube-dns responsibility is to discover a service IP using its name and only one replica of kube-dns is enough to resolve the service name to its IP. Hence we make kube-dns a deployment, because we don't need kube-dns on every node.

How to migrate the pods automatically to another node in kubernetes?

I am a new cookie to kubernetes . I am wondering if kubernetes have automatically switch the pods to another node if that node resources are on critical.
For example if Pod A , Pod B , Pod C is running on Node A and Pod D is running on Node B. The resources of Node A used by pods would be high. In these case whether kubernetes will migrate the any of the pods running in Node A to Node B.
I have learnt about node affinity and node selector which is used to run the pods in certain nodes. It would be helpfull if kubernetes offer this feature to migrate the pods to another node automatically if resources are used highly.
Can any one know how can we achieve this in kubernetes ?
Thanks
-S
Yes, Kubernetes can migrate the pods to another node automatically if resources are used highly. The pod would be killed and a new pod would be started on another node. You would probably want to learn about Quality of Service Classes, to understand which pod would be killed first.
That said, you may want to read about Automatic Horizontal Pod Autoscaling. This may give you more control.
With Horizontal Pod Autoscaling, Kubernetes automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with alpha support, on some other, application-provided metrics).
With increase of load it makes more sense to spin up a new pod rather than moving pod between different nodes to avoid distraction of currently running processes inside pod on busy node.
you can do node selector in deployment and move the node
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/