I'd like to get some clarification for preparation for maintenance when you drain nodes in a Kubernetes cluster:
Here's what I know when you run kubectl drain MY_NODE:
Node is cordoned
Pods are gracefully shut down
You can opt to ignore Daemonset pods because if they are shut down, they'll just be re-spawned right away again.
I'm confused as to what happens when a node is drained though.
Questions:
What happens to the pods? As far as I know, there's no 'live migration' of pods in Kubernetes.
Will the pod be shut down and then automatically started on another node? Or does this depend on my configuration? (i.e. could a pod be shut down via drain and not start up on another node)
I would appreciate some clarification on this and any best practices or advice as well. Thanks in advance.
By default kubectl drain is non-destructive, you have to override to change that behaviour. It runs with the following defaults:
--delete-local-data=false
--force=false
--grace-period=-1
--ignore-daemonsets=false
--timeout=0s
Each of these safeguard deals with a different category of potential destruction (local data, bare pods, graceful termination, daemonsets). It also respects pod disruption budgets to adhere to workload availability. Any non-bare pod will be recreated on a new node by its respective controller (e.g. daemonset controller, replication controller).
It's up to you whether you want to override that behaviour (for example you might have a bare pod if running jenkins job. If you override by setting --force=true it will delete that pod and it won't be recreated). If you don't override it, the node will be in drain mode indefinitely (--timeout=0s)).
I just want to add a few things to eamon1234's answer:
You may find this useful as well:
Link to official docummentation (in case default flags change etc.). According to it:
The 'drain' evicts or deletes all pods except mirror pods (which
cannot be deleted through the API server). If there are
DaemonSet-managed pods, drain will not proceed without
--ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately
replaced by the DaemonSet controller, which ignores unschedulable
markings. If there are any pods that are neither mirror pods nor
managed by ReplicationController, ReplicaSet, DaemonSet, StatefulSet
or Job, then drain will not delete any pods unless you use --force.
--force will also allow deletion to proceed if the managing resource of one or more pods is missing.
Simple chart illustrating what actually happens when using kubectl drain.
Using kubectl drain with --dry-run option may be also a good idea so you can see its outcome before any actual changes are applied e.g.:
kubectl drain foo --force --dry-run
however it will not show any errors about existing local data or daemonsets which you can see whithout using --dry-run flag:
... error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore) ...
We can use kubectl drain to safely evict all of our pods from a node before we perform maintenance on the node.
If you want to update or patch or any kind of maintenance on Hardware/Node you should first drain all the pods(Migrate pods one node to another) kubectl drain
When kubectl drain returns successfully, that indicates that all of the pods have been safely evicted. It is then safe to bring down the node
After maintenance work we can use kubectl uncordon to tell Kubernetes that it can resume scheduling new pods onto the node.
Related
I created a two nodes clusters and I created a new job using the busybox image that sleeps for 300 secs. I checked on which node this job is running using
kubectl get pods -o wide
I deleted the node but surprisingly the job was still finishing to run on the same node. Any idea if this is a normal behavior? If not how can I fix it?
Jobs aren't scheduled or running on nodes. The role of a job is just to define a policy by making sure that a pod with certain specifications exists and ensure that it runs till the completion of the task whether it completed successfully or not.
When you create a job, you are declaring a policy that the built-in job-controller will see and will create a pod for. Then the built-in kube-scheduler will see this pod without a node and patch the pod to it with a node's identity. The kubelet will see a pod with a node matching it's own identity and hence a container will be started. As the container will be still running, the control-plane will know that the node and the pod still exist.
There are two ways of breaking a node, one with a drain and the second without a drain. The process of breaking a node without draining is identical to a network cut or a server crash. The api-server will keep the node resource for a while, but it 'll cease being Ready. The pods will be then terminated slowly. However, when you drain a node, it looks as if you are preventing new pods from scheduling on to the node and deleting the pods using kubectl delete pod.
In both ways, the pods will be deleted and you will be having a job that hasn't run to completion and doesn't have a pod, therefore job-controller will make a new pod for the job and the job's failed-attempts will be increased by 1, and the loop will start over again.
Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )
I created a pod that I want to delete but it does not seem te work.
I did all ready tried a lot of different things.
I do not have deployments or replica sets, as a lot of people suggest to delete those.
When i type:
kubectl get all
Only the pod and the service is visible, both of which will return upon deleting them.
When i type:
kubectl delete pods <pod-name> --grace-period=0 --force
The pod will come back again.
Is there something else I can do?
If pod not part of the deployment, rs, sts, ds, then it must be part of the static pod. Static pod name combines with node (i.e staticpod-node01). By default location of the static path is /etc/Kubernetes/manifest/.
Doc: https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/
Apparently, it was some sort of invisible job that started the pod.
It was not showing when entering kubectl get jobs
So when finding this job and deleting it the pod disappeared!
I am running k8s on aws, and I updated the deployment of nginx - which normally, it works fine-, but after this time, the nginx deployment won't show up in "kubectl get deployments".
I want to kill all the pods related to nginx, but they keep reproduce themselves. I deleted all deployments "kubectl delete --all deployments", other pods just got terminated, but not nginx.
I have no idea where I can stop the pods recreating.
any idea where to start ?
check the deployment, replication controller and replica set and remove them.
kubectl get deploy,rc,rs
In modern kubernetes, there is also an annotation kubernetes.io/created-by on the Pod showing its "owner", as seen here, but I can't lay my hands on the documentation link right now. However, I found a pastebin containing a concrete example of the contents of the annotation
We are using Heat + Kubernetes (V0.19) to manage our apps. When do rolling update, sometimes container staring will always fail on a node but kubelet on the node will always retry but always fail. So the updating will hang there which is not the behavior we expected.
I found that using "kubectl delete node" to remove the node can avoid pods scheduled to that node. But in our env, the node to be deleted may have running pods on it.
So my question is:
After using "kubectl delete node" to remove the node, will the pods on that node still worked correctly ?
If you just want to cancel the rolling update, remove the failed pods and try again later, I have found that it is best to stop the update loop with CTRL+c and then delete the replication controller corresponding to the new app that is failing.
^C
kubectl delete replicationcontrollers your-app-v1.2.3