For my own deployments I can use node selectors to make them run on a specific node pool. I want to do the same thing for kubernetes master pods.
(Motivation: I upgraded the kubernetes cluster today causing the aforementioned pods to be moved to some random node pool which I want to prevent from happening again in the future.)
Related
I have a GKE cluster I'm trying to switch the default node machine type on.
I have already tried:
Creating a new node pool with the machine type I want
Deleting the default-pool. GKE will process for a bit, then not remove the default-pool. I assume this is some undocumented behavior where you cannot delete the default-pool.
I'd prefer to not re-create the cluster and re-apply all of my deployments/secrets/configs/etc.
k8s version: 1.14.10-gke.24 (Stable channel)
Cluster Type: Regional
The best approach to change/increase/decrease your node pool specification would be with:
Migration
To migrate your workloads without incurring downtime, you need to:
Create a new node pool.
Mark the existing node pool as unschedulable.
Drain the workloads running on the existing node pool.
Check if the workload is running correctly on a new node pool.
Delete the existing node pool.
Your workload will be scheduled automatically onto a new node pool.
Kubernetes, which is the cluster orchestration system of GKE clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.
There is official documentation about migrating your workload:
This tutorial demonstrates how to migrate workloads running on a GKE cluster to a new set of nodes within the same cluster without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.
-- GKE: Migrating workloads to different machine types
Please take a look at above guide and let me know if you have any questions in that topic.
Disable the default-pool's autoscaler and set the pool size to 0 nodes.
Wish there was a way I could just switch the machine type on the default-pool...
What would be the behavior of a multi node kubernetes cluster if it only has a single master node and if the node goes down?
The control plane would be unavailable. Existing pods would continue to run, however calls to the API wouldn't work, so you wouldn't be able to make any changes to the state of the system. Additionally self-repair systems like pods being restarted on failure would not happen since that functionality lives in the control plane as well.
You wouldn't be able to create or query kubernetes objects(pods, deployments etc) since the required control plane components(api-server and etcd) are not running.
Existing pods on the worker nodes will keep running. If a pod crashes, kubelet on that node would restart it as well.
If worker node goes down while master is down, even the pods created by a controllers like deployment/replicaset won't be re-scheduled to different node since controller-manager(control plane component) is not running.
I have a Gke cluster with one node pool attached
I want to make some changes to the node pool though- like adding tags, etc
So I created a new node pool with my new config and attached to the cluster. so now cluster has 2 node pools.
At this point I want to move the pods to the new node pool and destroy the old one
How is this process done? Am I doing this right?
There are multiple ways to move your pods to the new node pool.
One way is to steer your pods to the new node pool using a label selector in your pod spec, as described in the "More fun with node pools" in the Google blog post that announced node pools (with the caveat that you need to forcibly terminate the existing pods for them to be rescheduled). This leaves all nodes in your cluster functional, and you can easily shift the pods back and forth between pools using the labels on the node pools (GKE automatically adds the node pool name as a label to make this easier).
Another way is to follow the tutorial for Migrating workloads to different machine types, which describes how to cordon / drain nodes to shift workloads to the new node pool.
Finally, you can just use GKE to delete your old node pool. GKE will automatically drain nodes prior to deleting them, which will cause your workload to shift to the new pool without you needing to run any extra commands yourself.
You can also create or update an existing node pool to add taint and then update the manifest of the pod you want to move with tolerations. This ensures that only the pod you are interested to move, moves but not others.
Refer this doc for more detail.
You can use :
kubectl drain <node_name>
in order to move all pods from a specific node to other nodes
From this article(https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html) I learnt that it is possible to create a new node pool, and cordon and drain old nodes one by one, so that workloads get re-scheduled to new nodes in the new pool.
To me, a new node pool seems to indicate a new cluster. The reason: we have two node pools in GKE, and they're listed as two separate clusters.
My question is: after the pods under a service get moved to a new node, if that service is being called from other pods in the old node, will this inter-cluster service call fail?
You don't create a new cluster per se. You upgrade the master(s) and then you create a new node pool with nodes that have a newer version. Make sure the new node pool shares the same network as the original node pool.
If you have a service with one replica (one pod) if that pod is living in one of the nodes you are upgrading you need to allow time for Kubernetes to create a new replica on a different node that is not being upgraded. For that time, your service will be unavailable.
If you have a service with multiple replicas chances are that you won't see any downtime unless for some odd reason all your replicas are scheduled on the same node.
Recommendation: scale your resources which serve your services (Deployments, DaemonSets, StatefulSets, etc) by one or two replicas before doing node upgrades.
StatefulSet tip: You will have some write downtime if you are running something like mysql in a master-slave config when you reschedule your mysql master.
Note that creating a new node Pool does not create a new cluster. You can have multiple node pools within the same cluster. Workloads within the different node pools will still interact with each other since they are in the same cluster.
gcloud container node-pools create (the command to create node pools) requires that you specify the --cluster flag so that the new node pool is created within an existing cluster.
So to answer the question directly, following the steps from that Google link will not cause any service interruption nor will there be any issues with pods from the same cluster communicating with each other during your migration.
What should I do with pods after adding a node to the Kubernetes cluster?
I mean, ideally I want some of them to be stopped and started on the newly added node. Do I have to manually pick some for stopping and hope that they'll be scheduled for restarting on the newly added node?
I don't care about affinity, just semi-even distribution.
Maybe there's a way to always have the number of pods be equal to the number of nodes?
For the sake of having an example:
I'm using juju to provision small Kubernetes cluster on AWS. One master and two workers. This is just a playground.
My application is apache serving PHP and static files. So I have a deployment, a service of type NodePort and an ingress using nginx-ingress-controller.
I've turned off one of the worker instances and my application pods were recreated on the one that remained working.
I then started the instance back, master picked it up and started nginx ingress controller there. But when I tried deleting my application pods, they were recreated on the instance that kept running, and not on the one that was restarted.
Not sure if it's important, but I don't have any DNS setup. Just added IP of one of the instances to /etc/hosts with host value from my ingress.
descheduler, a kuberenets incubator project could be helpful. Following is the introduction
As Kubernetes clusters are very dynamic and their state change over time, there may be desired to move already running pods to some other nodes for various reasons:
Some nodes are under or over utilized.
The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
Some nodes failed and their pods moved to other nodes.
New nodes are added to clusters.
There is automatic redistribution in Kubernetes when you add a new node. You can force a redistribution of single pods by deleting them and having a host based antiaffinity policy in place. Otherwise Kubernetes will prefer using the new node for scheduling and thus achieve a redistribution over time.
What are your reasons for a manual triggered redistribution?