Pods are not rescheduling to failure node ,when node come alive - kubernetes

My situation is ,I have 5 nodes having K8 cluster .Initially pods are distributed across the 5 node. Sometime we need to restart particular node server. Then that node goes down and and pod will created on another node. But once failed/down node comes up ,no pods are creating in it automatically, as replica number already reached .We need all node have minimum 1 pods to run .Could please help on this

We need all node have minimum 1 pods to run
Instead of running the Deployment, you can run the Daemon set if you want to run a minimum of one 1 pod to run on the node.
So if anytime new comes to the cluster it will have one replica running of your POD.
Read more daemon set : https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
You can spread also replicas if running as deployment however it would be tricky to manage with Affinity, POD topology spread

Related

What if i delete a Node in GKS

I have setup GKS in free trail access.
here is screenshot of cluster
I have already setup vm instance in gce. So my kubernets cluster is having less resource for testing i have setup it but i want to know if i delete 1 node out of 3 what will happened
my pods are running in all 3 nodes(disturbed)
So i delete one node will it create a new node with deploy my running pods into another 2 nodes it will become heavy
how do i know its HA using and Scale Up and Scale Down
Please clear my questions
So i delete one node will it create a new node with deploy my running
pods into another 2 nodes it will become heavy
GKE will manage the Nodes using Node pool config.
if inside your GKE you have set 3 nodes and manually remove 1 instance it will auto create new Node in cluster.
You pod might get moved to another node if space is left there or else it will go to pending state and wait for new node to join the GKE cluster.
If you want to redice nodes in GKE you have to redice minimum count in GKE node pool.
If you want to test scale up and down, you can enable auto scaling on Node pool and increase the POD count on cluster, GKE will auto add nodes. Make sure you have set correctly min and max nodes into node pool section for autoscaling.
When you delete a node, its pods are also deleted. Depending on your deployment, i.e. you have Pod scale of 3, one node will hold 2 pods and the other 1. If your app will suffer or not it depends on the actual traffic.

In kubernetes, when does it make sense to have pods replicated in the same node?

I understand the logic behind replicating a pod across nodes, but is there any benefit to replicating pods in the same node? From my understanding this doesn't increase degree of replication, because one pod can take down a node.
Replication of pods within a node?
You can handle the replication condition/count using some metrics but you can't handle the spawning of the pods on the same node until unless there is some affinity set, as it's handled by kube-scheduler. However you can set various pod/node affinities, anti affinities and topologykey [if you wish to maintain a minimum number of pods on a particular node].
One pod can take a node down?
I highly doubt that, it can only happen if you don't have any requests/limits set for CPU/Memory or if in your HPA the maximum replicas are set to increase more than the capacity of the node with a condition that podaffinity is set so all the pods will be spawning on the same node.
When can such a scenario make sense?
The reason for it can be if your nodes are of irregular size. And you want to consume a particular node more than other nodes in the cluster.
In a best practices environment all the nodes are of regular size. HPA is set and limits/quotas are provided so that a pod doesn't crash a node by crossing all the limits of a node.
EDIT from comments:
So what if I want to run multiple instances of a nodejs app to utilize all the cores on my kube node? Lets say 8 cores, does it make sense to replicate the nodejs app pod 8 times in the same kube node or is it better to have 1 pod spin up 8 instances of the nodejs app?
A single-threaded application in your case, a pod will be considered as an instance. A pod to spin up 8 instances? It will be a multi-container pod with 8 containers of the same image, that would really be a bad practice and surely not even worth a test environment. However, having different replicas for the same Deployment is a do-able practice. But the question is how will you stop the Kubernetes service that if a pod is already serving some request and is in the lock state how will the request be routed to another pod. That's only possible by HPA if max request per pod is set to 1.
Why not use NodeJS Cluster module for kubernetes to utilise all the cores of the node in the cluster where App is deployed? -- Suggestion by #DanielKobe
node-js-scaling-out-on-kubernetes
Should-you-use-pm2-node-cluster-or-neither-in-kubernetes

Question about concept on Kubernetes pod assignment to nodes

I am quite a beginner in Kuberenetes and would like to ask about some concepts related to kuberenetes pod assignment.
Suppose there is a deployment to be made with a requirement of 3 replica sets.
(1)
Assume that there are 4 nodes, where each of it being a different physical server with different CPU and memory.
When the deployment is made, how would kubernetes assgin the pods to the nodes? Will there be scenario where it will put multiple pods on the same server, while a server does not have pod assignment (due to resource considereation)?
(2)
Assume there are 4 nodes (on 4 indentical physical servers), and 1 pod is created on each of the 4 nodes.
Suppose that now one of the nodes goes down. How would kuberenetes handle this? Will it recreate the pod on one of the other 3 nodes, based on which one having more available resources?
Thank you for any advice in advance.
There's a brief discussion of the Kubernetes Scheduler in the Kubernetes documentation. Generally scheduling is fairly opaque, but you also tend to aim for fairly well-loaded nodes; the important thing from your application point of view is to set appropriate resource requests: in your pod specifications. Just so long as there's enough room on each node to meet the resource requests, it usually doesn't matter to you which node gets picked.
In the scenario you describe, (1) it is possible that two replicas will be placed on the same node and so two nodes will go unused. That's especially true if the nodes aren't identical and they have resource constraints: if your pods require 4 GB of RAM, but you have some nodes that have less than that (after accounting for system pods and daemon set pods), the pods can't get scheduled there.
If a node fails (2) Kubernetes will automatically reschedule the pods running on that node if possible. "Fail" is a broad case, and can include a node being intentionally stopped to be upgraded or replaced. In this latter case you have some control over the cluster's behavior; see Disruptions in the documentation.
Many environments will run a cluster autoscaler. This can cause nodes to come and go automatically: if you try to schedule a pod and it won't fit, the autoscaler will allocate a new node, and if a node is under 50% utilization, it will be removed (and its pods rescheduled). In your first scenario you might start with only one node, but when the pod replicas don't all fit, the autoscaler would create a new node and once it's available the excess pods could be scheduled there.
Kubernetes will try to deploy pods to multiple nodes for better availability and resiliency. This will be based on the resource availability of the nodes. So if any node is not having enough capacity to host a pod it's possible that more than one replica of a pod is scheduled into same node.
Kubernetes will reschedule pods from the failed node to other available node which has enough capacity to host the pod. In this process again if there is no enough node which can host the replicas then there is a possibility that more than one replica is scheduled on same node.
You can read more on the scheduling algorithm here.
You can influence the scheduler by node and pod affinity and antiaffinity

How to implement graceful termination of nodes without service downtime when using cluster auto-scaler?

I have set up K8S cluster using EKS. Cluster Auto-scaler(CA) has been configured to increase/decrease the number of nodes based on resources availability for pods. The CA terminates a node if it's unneeded and pods on the node can be scheduled to another node. Here, the CA terminates the node before rescheduling the pods on another node. So, the pods get scheduled on another node after the node gets terminated. Hence, There is some downtime of some services until the re-scheduled pods become healthy.
How can I avoid the downtime by ensuring that the pods get scheduled on another node before the node gets terminated?
The graceful termination period for nodes is set to 10 minutes(Default).
You need to have multiple replicas of your application running. That will allow your application to survive even in case of node sudden death. Also you may want to configure antiAffinity rules to your app manifest to ensure that replicas reside on different nodes.

Deployment with replicaset & node shutdown

We have a deployment with replicas: 1
We deploy it in a 3 agent node k8s cluster (k8s 1.8.13) and it gets deployed to a node (say agent node-0). When I shutdown node-0, the rs does not get rescheduled (its been more than an hour now).
I have checked that the selector labels are correct and we have plenty of capacity in the cluster (and also we don’t specify resource requests). Also I checked that our node selectors are just checking for agent nodes and there are 2 other agent nodes available.
Is there any special treatments around this shutdown scenario that k8s does ?
That's the pod that get's re-scheduled, not the replicas set. If you would be doing rolling updates, based on the image version, for example, every time a new image would be available, controller manager would take the number of desired and available pods in the replica set to 0 and would create a new one.
But when you shut down a node, and the pod get's re-scheduled, keeping the same replica set, then your cluster is working fine.