Kubernetes worker node went down, what will happen to the pod? - kubernetes

I have setup a EKS in AWS, setup 2 worker node and configured the autosclaing on those nodes with 3 as desired capacity.
Sometime my worker node goes down due to "an EC2 health check indicating it has been terminated or stopped." which results my pod get restarted. I have not enabled any replicas for the pods. It is one now.
Just wanted to know, how can my services (pod) will be highly available despite of any worked node goes down or restart?

If you have only one pod for your service, then your service is NOT highly available. It is a single point of failure. If that pod dies or is restarted, as has happened here, then during the time the pod is being restarted, your service is dead.
You need a bare minimum, TWO pods for a service to be highly available, they they should be on different nodes (you can force Kuberentes to schedule the pods on different nodes using pod antiaffinity (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) so that if one node goes down as in your example, it takes out only pod, leaving the other pod(s) to handle the requests until the other pod can be rescheduled.

Related

Node networking issue with Openshift

I am running my services on an open shift cluster with all the nodes in ready status.
I found few microservice pods are having networking issues on selected nodes but they are up and running.
But when they are running on other nodes they are fine.
Also what can be the reason behind the pod is showing stickiness even after the restart pod is deployed on the same node again and again also there is no toleration-taint scenerio.

Controlling pods kubelet vs. controller in control plane

I'm a little confused, I've been ramping up on Kubernetes and I've been reading about all the different objects ReplicaSet, Deployment, Service, Pods etc.
In the documentation it mentions that the kubelet manages liveness and readiness checks which are defined in our ReplicaSet manifests.
Reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
If this is the case does the kubelet also manage the replicas? Or does that stay with the controller?
Or do I have it all wrong and it's the kubelet that is creating and managing all these resources on a pod?
Thanks in advance.
Basically kubelet is called "node agent" that runs on each node. It get notified through kube apiserver, then it start the container through container runtime, it works in terms of Pod Spec. It ensures the containers described in the Pod Specs are running and healthy.
The flow of kubelet tasks is like: kube apiserver <--> kubelet <--> CRI
To ensure whether the pod is running healthy it uses liveness probe, if it gets an error it restarts the pod.
kubelet does not maintain replicas, replicas are maintained by replicaset. As k8s doc said: A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
See more of ReplicaSet
For more info you can see: kubelet
When starting your journey with Kubernetes it is important to understand its main components for both Control Planes and Worker Nodes.
Based on your question we will focus on two of them:
kube-controller-manager:
Logically, each controller is a separate process, but to reduce
complexity, they are all compiled into a single binary and run in a
single process.
Some types of these controllers are:
Node controller: Responsible for noticing and responding when nodes go down.
Job controller: Watches for Job objects that represent one-off tasks, then creates Pods to run those tasks to completion.
Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
kubelet:
An agent that runs on each node in the cluster. It makes sure that
containers are running in a Pod.
The kubelet takes a set of PodSpecs that are provided through various
mechanisms and ensures that the containers described in those PodSpecs
are running and healthy. The kubelet doesn't manage containers which
were not created by Kubernetes.
So answering your question:
If this is the case does the kubelet also manage the replicas? Or does
that stay with the controller?
No, replication can be managed by the Replication Controller, a ReplicaSet or a more recommended Deployment. Kubelet runs on Nodes and makes sure that the Pods are running according too their PodSpecs.
You can find synopsis for kubelet and kube-controller-manager in the linked docs.
EDIT:
There is one exception however in a form of Static Pods:
Static Pods are managed directly by the kubelet daemon on a specific
node, without the API server observing them. Unlike Pods that are
managed by the control plane (for example, a Deployment); instead, the
kubelet watches each static Pod (and restarts it if it fails).
Note that it does not apply to multiple replicas.

How to migrate the pods automatically to another node in kubernetes?

I am a new cookie to kubernetes . I am wondering if kubernetes have automatically switch the pods to another node if that node resources are on critical.
For example if Pod A , Pod B , Pod C is running on Node A and Pod D is running on Node B. The resources of Node A used by pods would be high. In these case whether kubernetes will migrate the any of the pods running in Node A to Node B.
I have learnt about node affinity and node selector which is used to run the pods in certain nodes. It would be helpfull if kubernetes offer this feature to migrate the pods to another node automatically if resources are used highly.
Can any one know how can we achieve this in kubernetes ?
Thanks
-S
Yes, Kubernetes can migrate the pods to another node automatically if resources are used highly. The pod would be killed and a new pod would be started on another node. You would probably want to learn about Quality of Service Classes, to understand which pod would be killed first.
That said, you may want to read about Automatic Horizontal Pod Autoscaling. This may give you more control.
With Horizontal Pod Autoscaling, Kubernetes automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with alpha support, on some other, application-provided metrics).
With increase of load it makes more sense to spin up a new pod rather than moving pod between different nodes to avoid distraction of currently running processes inside pod on busy node.
you can do node selector in deployment and move the node
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

How to kickoff the dead replicas of Kubernetes Deployment

Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)

Does HorizontalPodAutoscaler make sense when there is only one Deployment on GKE (Google Container Engine) Kubernetes cluster?

I have a "homogeneous" Kubernetes setup. By this I mean that I am only running instances of a single type of pod (an http server) with a load balancer service distributing traffic to them.
By my reasoning, to get the most out of my cluster (edit: to be concrete -- getting the best average response times to http requests) I should have:
At least one pod running on every node: Not having a pod running on a node, means that I am paying for the node and not having it ready to serve a request.
At most one pod running on every node: The pods are threaded http servers so they can maximize utilization of a node, so running multiple pods on a node does not net me anything.
This means that I should have exactly one pod per node. I achieve this using a DaemonSet.
The alternative way is to configure a Deployment and apply a HorizontalPodAutoscaler to it and have Kubernetes handle the number of pods and pod to node mapping. Is there any disadvantage of my approach in comparison to this?
My evaluation is that the HorizontalPodAutoscaler is relevant mainly in heterogeneous situations, where one HorizontalPodAutoscaler can scale up a Deployment at the expense of another Deployment. But since I have only one type of pod, I would have only one Deployment and I would be scaling up that deployment at the expense of itself, which does not make sense.
HorizontalPodAutoscaler is actually a valid solution for your needs. To address your two concerns:
1. At least one pod running on every node
This isn't your real concern. The concern is underutilizing your cluster. However, you can be underutilizing your cluster even if you have a pod running on every node. Consider a three-node cluster:
Scenario A: pod running on each node, 10% CPU usage per node
Scenario B: pod running on only one node, 70% CPU usage
Even though Scenario A has a pod on each node the cluster is actually being less utilized than in Scenario B where only one node has a pod.
2. At most one pod running on every node
The Kubernetes scheduler tries to spread pods around so that you don't end up with multiple pods of the same type on a single node. Since in your case the other nodes should be empty, the scheduler should have no problems starting the pods on the other nodes. Additionally, if you have the pod request resources equivalent to the node's resources, that will prevent the scheduler from scheduling a new pod on a node that already has one.
Now, you can achieve the same effect whether you go with DaemonSet or HPA, but I personally would go with HPA since I think it fits your semantics better, and would also work much better if you eventually decide to add other types of pods to your cluster
Using a DamonSet means that the pod has to run on every node (or some subset). This is a great fit for something like a logger or a metrics collector which is per-node. But you really just want to use available cluster resources to power your pod as needed, which matches up better with the intent of HPA.
As an aside, I believe GKE supports cluster autoscaling, so you should never be paying for nodes that aren't needed.