I have a kubernetes cluster with 3 masters and 3 workers, I want to restart one of the masters to update the system of the master machine.
So can I just reboot the machine directly on the console with reboot,
or some steps need to be done before the reboot to void the risk of out of service and data loss?
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer (the default time is 5 minutes, controlled by --pod-eviction-timeout on the controller-manager), then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding replica set (or replication controller), then a new copy of the pod will be started on a different node. So, in the case where all pods are replicated, upgrades can be done without special coordination, assuming that not all nodes will go down at the same time
If you want more control over the upgrading process, you may use the following workflow:
Use kubectl drain to gracefully terminate all pods on the node while marking the node as unschedulable:
kubectl drain $NODENAME
This keeps new pods from landing on the node while you are trying to get them off.
For pods with a replica set, the pod will be replaced by a new pod which will be scheduled to a new node. Additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
For pods with no replica set, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
Perform maintenance work on the node.
Make the node schedulable again:
kubectl uncordon $NODENAME
Additionally if the node is hosting ETCD then you need to be extra careful in terms of rolling upgrade of ETCD and backing up the data
Take a backup of the ETCD if it's hosting the ETCD. You can use the in-built command to backup the data like
ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /tmp/snapshot-pre-boot.db
Now drain the node using
kubectl drain <master01>
Do the System update | patches and reboot.
Now uncordon the node back to the cluster
kubectl uncordon <master01>
Whenever you wish to reboot OS on the particular Node(Master, worker), K8s cluster engine does not aware for that action and it keeps all the cluster related events in ETCD key value storage, backing up the most recent data. As soon as you wish carefully prepare cluster Node reboot, you might have to adjust Maintenance job on this Node in order to drain it from scheduling and gracefully terminate all the existing Pods.
If you compose any relevant K8s resource within defined set of replicas, then ReplicationController guarantees that a specified number of pod replicas are running at any one time through each available Node. It simply re-spawns Pods if they failed health check, deleted or terminated, matching desired replicas. In case of Master nodes which host ETCDs you need to be extra careful in terms of rolling upgrade of ETCD and backing up the data.
1. Backup a single master
As mentioned previously, we need to backup etcd. In addition to that, we need the certificates and
optionally the kubeadm configuration file for easily restoring the
master. If you set up your cluster using kubeadm (with no special
configuration) you can do it similar to this:
Backup certificates:
$ sudo cp -r /etc/kubernetes/pki backup/
Make etcd snapshot:
$ sudo docker run --rm -v $(pwd)/backup:/backup \
--network host \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd-amd64:3.2.18 \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /backup/etcd-snapshot-latest.db
Backup kubeadm-config:
$ sudo cp /etc/kubeadm/kubeadm-config.yaml backup/
Note that the contents of the backup folder should then be stored somewhere safe, where it can survive if the master is completely destroyed. You perhaps want to use e.g. AWS S3 (or similar) for this.
There are three commands in the example and all of them should be run on the master node. The first one copies the folder containing all the certificates that kubeadm creates. These certificates are used for secure communications between the various components in a Kubernetes cluster. The final command is optional and only relevant if you use a configuration file for kubeadm. Storing this file makes it easy to initialize the master with the exact same configuration as before when restoring it.
If master update went wrong you can then simply restore old version of master node.
You can also automate etcd backups.
Doing a single backup manually may be a good first step but you really need to make regular backups for them to be useful. The easiest way to do this is probably to take the commands from the example above, create a small script and a cron job that runs the script every now and then. But since we are running Kubernetes anyway, use a Kubernetes CronJob. This would allow you to keep track of the backup jobs inside Kubernetes just like you monitor your workloads.
More information you can find here: backups-kubernetes.
2. Next step is to mark a node unschedulable, run this command:
$ kubectl drain $NODENAME
The kubectl drain command should only be issued to a single node at a time. However, you can run multiple kubectl drain commands for different nodes in parallel, in different terminals or in the background. Multiple drain commands running concurrently will still respect the PodDisruptionBudget you specify.
3. Execute the system update or patch and reboot.
4. Finally uncordon the node back to the cluster, execute command below:
$ kubectl uncordon $NODENAME
On GCP there is option such auto-upgrading nodes which improve managing node updates.
About maintenance Kubernetes nodes's you can read here: node-maintenace.
Related
There is no clear information about how to make a backup and restore from a regular node like node01 for instance, I mean:
Operating etcd clusters for Kubernetes shows information like how to use it and
ETCD - backup and restore management shows some of the necessary steps.
But how about in the cert exam, you are operating most of the time from a regular node01, the config files are not the same? Can some one elaborate?
Thanks
It is impossible to backup cluster from a regular node using etcd. The etcd can only be run on a master node.
But you can backup your Kubernetes cluster by command: etcdctl backup. Here you can find completely guide, how to use etcdctl backup command.
Another way is making a snapshot of your cluster by command: etcdctl snapshot save.
This command will let you create incremental backup.
Incremental backup of etcd, where full snapshot is taken first and then we apply watch and persist the logs accumulated over certain period to snapshot store. Restore process, restores from the full snapshot, start the embedded etcd and apply the logged events one by one.
You can find more about incremental backup function here.
I have a 3 node Kubernetes cluster used for development.
One of the node's status is "Attempting to reclaim ephemeral-storage" since 11 days.
How to reclaim storage ?
Since it is just development instance I cannot extend the storage. I dont care about the existing data in the storage. How to clear the storage ?
Thanks
Just run 'docker system prune command' to free up the space on the node. refer the below command
$ docker system prune -a --volumes
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all volumes not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
Since it's a development environment you can just drain the node to clear all pods and their data and then uncordon for pods to be scheduled again
kubectl drain --delete-local-data --ignore-daemonsets $NODE_NAME && kubectl uncordon $NODE_NAME
--delete-local-data flag is for cleaning data of the pods.
While reducing number of nodes in node pool in GKE, I want a specific node to be not killed by GKE. Is it possible to do that?
As explained in detail here - you can drain and delete the instance from the instance group (Not direct delete instance)
kubectl drain gke_instance_name --force
The delete from instance group command followed by wait-until-stable command:
$ gcloud compute instance-groups managed delete-instances $GROUP_ID --instances=gke_instance_name
$ gcloud compute instance-groups managed wait-until-stable $GROUP_ID
This will not allow you to blacklist some instances from being deleted - but other way around that you can delete specific instances and leave others untouched.
I'm running a three node cluster on GCE. I want to drain one node and delete the underlying VM.
Documentation for kubectl drain command says:
Once it returns (without giving an error), you can power down the node (or equivalently, if on a cloud platform, delete the virtual machine backing the node)
I execute the following commands:
Get the nodes
$ kl get nodes
NAME STATUS AGE
gke-jcluster-default-pool-9cc4e660-6q21 Ready 43m
gke-jcluster-default-pool-9cc4e660-rx9p Ready 6m
gke-jcluster-default-pool-9cc4e660-xr4z Ready 23h
Drain node rx9p.
$ kl drain gke-jcluster-default-pool-9cc4e660-rx9p --force
node "gke-jcluster-default-pool-9cc4e660-rx9p" cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: fluentd-cloud-logging-gke-jcluster-default-pool-9cc4e660-rx9p, kube-proxy-gke-jcluster-default-pool-9cc4e660-rx9p
node "gke-jcluster-default-pool-9cc4e660-rx9p" drained
Delete gcloud VM.
$ gcloud compute instances delete gke-jcluster-default-pool-9cc4e660-rx9p
List VMs.
$ gcloud compute instances list
In the result, I'm seeing the VM I deleted above - rx9p. If I do kubectl get nodes, I'm seeing the rx9p node too.
What's going on? Something is restarting the VM I'm deleting? Do I have to wait for some timeout between the commands?
You are on the right track with draining the node first.
The nodes (compute instances) are part of a managed instance group. If you delete just them with the gcloud compute instances delete command the managed instance group will recreate them.
To delete one properly use this command (after you have drained it!):
gcloud compute instance-groups managed delete-instances \
gke-jcluster-default-pool-9cc4e660-grp \
--instances=gke-jcluster-default-pool-9cc4e660-rx9p \
--zone=...
I'm wondering the graceful way to reduce nodes in a Kubernetes cluster on GKE.
I have some nodes each of which has some pods watching a shared job queue and executing a job. I also have the script which monitors the length of the job queue and increase the number of instances when the length exceeds a threshold by executing gcloud compute instance-groups managed resize command and it works ok.
But I don't know the graceful way to reduce the number of instances when the length falls below the threshold.
Is there any good way to stop the pods working on the terminating instance before the instance gets terminated? or any other good practice?
Note
Each job can take around between 30m and 1h
It is acceptable if a job gets executed more than once (in the worst case...)
I think the best approach is instead of using a pod to run your tasks, use the kubernetes job object. That way when the task is completed the job terminates the container. You would only need a small pod that could initiate kubernetes jobs based on the queue.
The more kube jobs that get created, the more resources will be consumed and the cluster auto-scaler will see that it needs to add more nodes. A kube job will need to complete even if it gets terminated, it will get re-scheduled to complete.
There is no direct information in the GKE docs about whether a downsize will happen if a Job is running on the node, but the stipulation seems to be if a pod can be easily moved to another node and the resources are under-utilized it will drain the node.
Refrences
https://cloud.google.com/container-engine/docs/cluster-autoscaler
http://kubernetes.io/docs/user-guide/kubectl/kubectl_drain/
http://kubernetes.io/docs/user-guide/jobs/
Before resizing the cluster, let's set the project context in the cloud shell by running the below commands:
gcloud config set project [PROJECT_ID]
gcloud config set compute/zone [COMPUTE_ZONE]
gcloud config set compute/region [COMPUTE_REGION]
gcloud components update
Note: You can also set project, compute zone & region as flags in the below command using --project, --zone, and --region operational flags
gcloud container clusters resize [CLUSTER_NAME] --node-pool [POOL_NAME] --num-nodes [NUM_NODES]
Run the above command for each node pool. You can omit the --node-pool flag if you have only one node pool.
Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/resizing-a-cluster