Deployment with replicaset & node shutdown - kubernetes

We have a deployment with replicas: 1
We deploy it in a 3 agent node k8s cluster (k8s 1.8.13) and it gets deployed to a node (say agent node-0). When I shutdown node-0, the rs does not get rescheduled (its been more than an hour now).
I have checked that the selector labels are correct and we have plenty of capacity in the cluster (and also we don’t specify resource requests). Also I checked that our node selectors are just checking for agent nodes and there are 2 other agent nodes available.
Is there any special treatments around this shutdown scenario that k8s does ?

That's the pod that get's re-scheduled, not the replicas set. If you would be doing rolling updates, based on the image version, for example, every time a new image would be available, controller manager would take the number of desired and available pods in the replica set to 0 and would create a new one.
But when you shut down a node, and the pod get's re-scheduled, keeping the same replica set, then your cluster is working fine.

Related

Pods are not rescheduling to failure node ,when node come alive

My situation is ,I have 5 nodes having K8 cluster .Initially pods are distributed across the 5 node. Sometime we need to restart particular node server. Then that node goes down and and pod will created on another node. But once failed/down node comes up ,no pods are creating in it automatically, as replica number already reached .We need all node have minimum 1 pods to run .Could please help on this
We need all node have minimum 1 pods to run
Instead of running the Deployment, you can run the Daemon set if you want to run a minimum of one 1 pod to run on the node.
So if anytime new comes to the cluster it will have one replica running of your POD.
Read more daemon set : https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
You can spread also replicas if running as deployment however it would be tricky to manage with Affinity, POD topology spread

How to migrate StatefulSet to different nodes?

I have a Kubernetes cluster of 3 nodes in Amazon EKS. It's running 3 pods of Cockroachdb in a StatefulSet. Now I want to use another instance type for all nodes of my cluster.
So my plan was this:
Add 1 new node to the cluster, increase replicas in my StatefulSet to 4 and wait for the new Cockroachdb pod to fully sync.
Decommission and stop one of the old Cockroachdb nodes.
Decrease replicas of the StatefulSet back to 3 to get rid of one of the old pods.
Repeat steps 1-3 two more times.
Obviously, that doesn't work because StatefulSet deletes most recent pods first when scaling down, so my new pod gets deleted instead of the old one.
I guess I could just create a new StatefulSet and make it use existing PVs, but that doesn't seem like the best solution for me. Is there any other way to do the migration?
You can consider make a copy of your ASG current launch template -> upgrade the instance type of the copied template -> point your ASG to use this new launch template -> perform ASG instance refresh. Cluster of 3 nodes with minimum 90% of healthy % ensure only 1 instance will be replace at a time. Affected pod on the drained node will enter pending state for 5~10 mins and redeploy on the new node. This way you do not need to scale up StatefulSet un-necessary.

What if i delete a Node in GKS

I have setup GKS in free trail access.
here is screenshot of cluster
I have already setup vm instance in gce. So my kubernets cluster is having less resource for testing i have setup it but i want to know if i delete 1 node out of 3 what will happened
my pods are running in all 3 nodes(disturbed)
So i delete one node will it create a new node with deploy my running pods into another 2 nodes it will become heavy
how do i know its HA using and Scale Up and Scale Down
Please clear my questions
So i delete one node will it create a new node with deploy my running
pods into another 2 nodes it will become heavy
GKE will manage the Nodes using Node pool config.
if inside your GKE you have set 3 nodes and manually remove 1 instance it will auto create new Node in cluster.
You pod might get moved to another node if space is left there or else it will go to pending state and wait for new node to join the GKE cluster.
If you want to redice nodes in GKE you have to redice minimum count in GKE node pool.
If you want to test scale up and down, you can enable auto scaling on Node pool and increase the POD count on cluster, GKE will auto add nodes. Make sure you have set correctly min and max nodes into node pool section for autoscaling.
When you delete a node, its pods are also deleted. Depending on your deployment, i.e. you have Pod scale of 3, one node will hold 2 pods and the other 1. If your app will suffer or not it depends on the actual traffic.

Can I force Kubernetes not to run more than X replicas of a pod in the same node?

I have a tiny Kubernetes cluster consisting of just two nodes running on t3a.micro AWS EC2 instances (to save money).
I have a small web app that I am trying to run in this cluster. I have a single Deployment for this app. This deployment has spec.replicas set to 4.
When I run this Deployment, I noticed that Kubernetes scheduled 3 of its pods in one node and 1 pod in the other node.
Is it possible to force Kubernetes to schedule at most 2 pods of this Deployment per node? Having 3 instances in the same pod puts me dangerously close to running out of memory in these tiny EC2 instances.
Thanks!
The correct solution for this would be to set memory requests and limits correctly matching your steady state and burst RAM consumption levels on every pod, then the scheduler will do all this math for you.
But for the future and for others, there is a new feature which kind of allows this https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/. It's not an exact match, you can't put a global cap, rather you can require pods be evenly spaced over the cluster subject to maximum skew caps.

Kubernetes Horizontal Pod Autoscaler not utilising node resources

I am currently running Kubernetes 1.9.7 and successfully using the Cluster Autoscaler and multiple Horizontal Pod Autoscalers.
However, I recently started noticing the HPA would favour newer pods when scaling down replicas.
For example, I have 1 replica of service A running on a node alongside several other services. This node has plenty of available resource. During load, the target CPU utilisation for service A rose above the configured threshold, therefore the HPA decided to scale it to 2 replicas. As there were no other nodes available, the CAS span up a new node on which the new replica was successfully scheduled - so far so good!
The problem is, when the target CPU utilisation drops back below the configured threshold, the HPA decides to scale down to 1 replica. I would expect to see the new replica on the new node removed, therefore enabling the CAS to turn off that new node. However, the HPA removed the existing service A replica that was running on the node with plenty of available resources. This means I now have service A running on a new node, by itself, that can't be removed by the CAS even though there is plenty of room for service A to be scheduled on the existing node.
Is this a problem with the HPA or the Kubernetes scheduler? Service A has now been running on the new node for 48 hours and still hasn't been rescheduled despite there being more than enough resources on the existing node.
After scouring through my cluster configuration, I managed to come to a conclusion as to why this was happening.
Service A was configured to run on a public subnet and the new node created by the CA was public. The existing node running the original replica of Service A was private, therefore leading the HPA to remove this replica.
I'm not sure how Service A was scheduled onto this node in the first place, but that is a different issue.