How to migrate StatefulSet to different nodes? - kubernetes

I have a Kubernetes cluster of 3 nodes in Amazon EKS. It's running 3 pods of Cockroachdb in a StatefulSet. Now I want to use another instance type for all nodes of my cluster.
So my plan was this:
Add 1 new node to the cluster, increase replicas in my StatefulSet to 4 and wait for the new Cockroachdb pod to fully sync.
Decommission and stop one of the old Cockroachdb nodes.
Decrease replicas of the StatefulSet back to 3 to get rid of one of the old pods.
Repeat steps 1-3 two more times.
Obviously, that doesn't work because StatefulSet deletes most recent pods first when scaling down, so my new pod gets deleted instead of the old one.
I guess I could just create a new StatefulSet and make it use existing PVs, but that doesn't seem like the best solution for me. Is there any other way to do the migration?

You can consider make a copy of your ASG current launch template -> upgrade the instance type of the copied template -> point your ASG to use this new launch template -> perform ASG instance refresh. Cluster of 3 nodes with minimum 90% of healthy % ensure only 1 instance will be replace at a time. Affected pod on the drained node will enter pending state for 5~10 mins and redeploy on the new node. This way you do not need to scale up StatefulSet un-necessary.

Related

Running DB as Kubernetes Deployment or StatefulSet?

I would like to run a single pod of Mongo db in my Kubernetes cluster. I would be using node selector to get the pod scheduled on a specific node.
Since Mongo is a database and I am using node selector, is there any reason for me not to use Kubernetes Deployment over StatefulSet? Elaborate more on this if we should never use Deployment.
Since mongo is a database and I am using node selector, Is there any reason for me not to use k8s deployment over StatefulSet? Elaborate more on this if we should never use Deployment.
You should not run a database (or other stateful workload) as Deployment, use StatefulSet for those.
They have different semantics while updating or when the pod becomes unreachable. StatefulSet use at-most-X semantics and Deployments use at-least-X semantics, where X is number of replicas.
E.g. if the node becomes unreachable (e.g. network issue), for Deployment, a new Pod will be created on a different node (to follow your desired 1 replica), but for StatefulSet it will make sure to terminate the existing Pod before creating a new, so that there are never more than 1 (when you have 1 as desired number of replicas).
If you run a database, I assume that you want the data consistent, so you don't want duplicate instances with different data - (but should probably run a distributed database instead).

Added new node to eks, new pods still being scheduled on old nodes

I have a terraform-managed EKS cluster. It used to have 2 nodes on it. I doubled the number of nodes (4).
I have a kubernetes_deployment resource that automatically deploys a fixed number of pods to the cluster. It was set to 20 when I had 2 nodes, and seemed evenly distributed with 10 each. I doubled that number to 40.
All of the new pods for the kubernetes deployment are being scheduled on the first 2 (original) nodes. Now the two original nodes have 20 pods each, while the 2 new nodes have 0 pods. The new nodes are up and ready to go, but I cannot get kubernetes to schedule the new pods on those new nodes.
I am unsure where to even begin searching, as I am fairly new to k8s and ops in general.
A few beginner questions that may be related:
I'm reading about pod affinity, and it seems like I could tell k8s to have a pod ANTI affinity with itself within a deployment. However, I am having trouble setting up the anti-affinity rules. I see that the kubernetes_deployment resource has a scheduling argument, but I can't seem to get the syntax right.
Naively it seems that the issue may be that the deployment somehow isn't aware of the new nodes. If that is the case, how could I reboot the entire deployment (without taking down the already-running pods)?
Is there a cluster level scheduler that I need to set? I was under the impression that the default does round robin, which doesn't seem to be happening at the node level.
EDIT:
The EKS terraform module node_groups submodule has fields for desired/min/max_capacity. To increase my worker nodes, I just increased those numbers. The change is reflected in the aws eks console.
Check a couple of things:
Do your nodes show up correctly in the output of kubectl get nodes -o wide and do they have a state of ready?
Instead of pod affinity look into pod topology spread constraints. Anti affinity will not work with multiple pods.

What if i delete a Node in GKS

I have setup GKS in free trail access.
here is screenshot of cluster
I have already setup vm instance in gce. So my kubernets cluster is having less resource for testing i have setup it but i want to know if i delete 1 node out of 3 what will happened
my pods are running in all 3 nodes(disturbed)
So i delete one node will it create a new node with deploy my running pods into another 2 nodes it will become heavy
how do i know its HA using and Scale Up and Scale Down
Please clear my questions
So i delete one node will it create a new node with deploy my running
pods into another 2 nodes it will become heavy
GKE will manage the Nodes using Node pool config.
if inside your GKE you have set 3 nodes and manually remove 1 instance it will auto create new Node in cluster.
You pod might get moved to another node if space is left there or else it will go to pending state and wait for new node to join the GKE cluster.
If you want to redice nodes in GKE you have to redice minimum count in GKE node pool.
If you want to test scale up and down, you can enable auto scaling on Node pool and increase the POD count on cluster, GKE will auto add nodes. Make sure you have set correctly min and max nodes into node pool section for autoscaling.
When you delete a node, its pods are also deleted. Depending on your deployment, i.e. you have Pod scale of 3, one node will hold 2 pods and the other 1. If your app will suffer or not it depends on the actual traffic.

What happens when we scale the kubernetes deployment and change one of the pod or container configuration?

When i scale the application by creating deployment .Let's say i am running nginx service on 3 cluster.
Nginx is running in containers in multiple pods .
If i change nginx configuration in one of the pod ,does it propagate to all the nodes and pods because it is running in cluster and scaled.
does it propagate to all the nodes and pods because it is running in
cluster and scaled.
No. Only when you change the deployment yaml. Then it re-creates pods 1 by 1 with the new configuration.
I would like to add a few more things to what was already said. First of all you are even not supposed to do any changes to Pods which are managed let's say by ReplicaSet, ReplicationController or Deployment. This are objects which provide additional abstraction layer and it is their responsibility to ensure that there are given number of Pods of a certain kind running in your kubernetes cluster.
It doesn't matter how many nodes your cluster consists of as mentioned controllers span across all nodes in the cluster.
Changes made in a single Pod will not only not propagate to other Pods but may be easily lost if such newly created Pod with changed configuration crashes.
Remember that one of the tasks of the Deployment is to make sure that certain number of Pods of a given type ( specified in a Pod template section of the Deployment ) are always up and running. When your manually reconfigured Pod goes down then your Deployment (actually ReplicaSet created by the Deployment) acts behind the scenes and recreates such Pod. But how does it recreate it ? Does it take into consideration changes introduced by you to such Pod ? Of course not, it will recreate it based on the template it is given in the Deployment.
If you want to make changes in your Pods one by one kubernetes allows you to do so by providing so called rolling update mechanism.
Here you can read about old-fashioned approach using ReplicationController which is not used any more as it is replaced by Deployments and ReplicaSets but I think it's still worth reading just to grasp the concept.
Currently Deployment is the way to go. About updating a Deployment you can read here. Note that the default update strategy is RollingUpdate which ensures that changes are not applied to all Pods at once but one by one.

Deployment with replicaset & node shutdown

We have a deployment with replicas: 1
We deploy it in a 3 agent node k8s cluster (k8s 1.8.13) and it gets deployed to a node (say agent node-0). When I shutdown node-0, the rs does not get rescheduled (its been more than an hour now).
I have checked that the selector labels are correct and we have plenty of capacity in the cluster (and also we don’t specify resource requests). Also I checked that our node selectors are just checking for agent nodes and there are 2 other agent nodes available.
Is there any special treatments around this shutdown scenario that k8s does ?
That's the pod that get's re-scheduled, not the replicas set. If you would be doing rolling updates, based on the image version, for example, every time a new image would be available, controller manager would take the number of desired and available pods in the replica set to 0 and would create a new one.
But when you shut down a node, and the pod get's re-scheduled, keeping the same replica set, then your cluster is working fine.