How to rollout without killing processes in K8s? - kubernetes

I'm using:
kubectl rollout restart deployment my_cool_workers
This terminates the workers and start new ones.
However I want to rollout in a way where if something is running on a specific worker I want to let the task finish - I don't want to kill the tasks (so the worker should finish the tasks but not accepting new)
Meaning - rollout new workers -> old workers no longer accept traffic -> when old worker is no longer running anything terminate it.
How can this be done?

If a Pod gets killed, manually via kubectl or by any k8s controller like during a deployment, it will instantly change from Running into Terminating state. At the same time, the SIGTERM signal will be sent to all containers inside that Pod.
Starting from Kubernetes 1.19 you can debug running pods using Ephemeral Containers and kubectl debug command.
While in Terminating state, containers of a Pod are not restarted if they end. Whenever a container inside a Pod stops while in Running state, the container is restarted. This is done because a Pod should always be running unless an error occurred.
For more information refer to this document.

Related

Is it possible to kill previous replicaset without waiting for new replicaset to be rolled out?

Context
Say we have d.yaml in which a deployment, whose strategy is RollingUpdate, is defined.
We first create a deployment:
kubectl apply -f d.yaml
After some time, we modify d.yaml and re-apply it to update the deployment.
vi d.yaml
kubectl apply -f d.yaml
This starts rolling out a new replicaset R_new.
Normally, the old (previous) replicaset R_old is killed only after R_new has successfully been rolled out.
Question (tl;dr)
Is it possible to kill R_old without waiting for rolling out R_new to complete?
By "kill", I mean completely stopping a replicaset; it should never restart. (So kubectl delete replicaset didn't help.)
Question (long)
In my specific situation, my containers connect to an external database. This single database is also connected from many containers managed by other teams.
If the maximum number of connections allowed is already reached, new containers associated with R_new fail to start (i.e. CrashLoopBackOff).
If I could forcefully kill R_old, the number of connections would be lowered by N where N is the number of replicas, and thus R_new's containers would successfully start.
FAQ:
Q. Why not temporarily stop using RollingUpdate strategy?
A. Actually I have no permission to edit d.yaml. It is edited by CI/CD.
Q. Why not just make the maximum number of connections larger?
A. I have no permission for the database either...
Is it possible to kill R_old without waiting for rolling out R_new to complete?By "kill", I mean completely stopping a replicaset; it should never restart. (So kubectl delete replicaset didn't help.)
Make Changes to deployment
Scale down deployment to replicas=0 so that it is as good as stopping the old replicaset
scale up deployment to desired number of replicas, new replicasets will be created with new configuration changes in deployment.
steps number 1 & 2 can be interchanged based on the requirement
Technically, you can delete the old replicaset by running kubectl delete replicaset R_old and this would terminate the old pod. I just verified it in my cluster (kubernetes version 1.21.8).
However, terminating a pod doesn't necessarily mean it is been killed immediately.
The actual termination of the pod depends on whether or not a preStop lifecycle hook is defined, and on the value of terminationGracePeriodSeconds (see description by typing kubectl explain pod.spec.terminationGracePeriodSeconds).
By default, the kubelet deletes the pod only after all of the processes in its containers are stopped, or the grace period has been over.

Job still running even when deleting nodes

I created a two nodes clusters and I created a new job using the busybox image that sleeps for 300 secs. I checked on which node this job is running using
kubectl get pods -o wide
I deleted the node but surprisingly the job was still finishing to run on the same node. Any idea if this is a normal behavior? If not how can I fix it?
Jobs aren't scheduled or running on nodes. The role of a job is just to define a policy by making sure that a pod with certain specifications exists and ensure that it runs till the completion of the task whether it completed successfully or not.
When you create a job, you are declaring a policy that the built-in job-controller will see and will create a pod for. Then the built-in kube-scheduler will see this pod without a node and patch the pod to it with a node's identity. The kubelet will see a pod with a node matching it's own identity and hence a container will be started. As the container will be still running, the control-plane will know that the node and the pod still exist.
There are two ways of breaking a node, one with a drain and the second without a drain. The process of breaking a node without draining is identical to a network cut or a server crash. The api-server will keep the node resource for a while, but it 'll cease being Ready. The pods will be then terminated slowly. However, when you drain a node, it looks as if you are preventing new pods from scheduling on to the node and deleting the pods using kubectl delete pod.
In both ways, the pods will be deleted and you will be having a job that hasn't run to completion and doesn't have a pod, therefore job-controller will make a new pod for the job and the job's failed-attempts will be increased by 1, and the loop will start over again.

How can a pod have status ready and terminating?

Curiously, I saw that a pod I had had both ready 1/1 status and status terminating when I ran kubectl get pods. Are these states not mutually exclusive? Why or why not?
For context, this was noticed immediately after I had killed skaffold so these pods were in the middle of shutting down.
When pods are in terminating state, they could still be functioning. The pod could be delayed in termination due to many reasons (eg. could be that you have a PVC attached, other pods are being terminated at the same time, etc). You could test this by running the following on a pod with a PVC attached or another reason to be terminated with a delay:
$ kubectl delete pod mypod-xxxxx-xxxxxx
pod mypod-xxxxx-xxxxxx deleted
$ kubectl delete pod mypod-xxxxx-xxxxxx
pod mypod-xxxxx-xxxxxx deleted
$ kubectl apply mypod.yaml
pod mypod-xxxxx-xxxxxx configured
Sometimes this happens because the pod is still in the terminating period and is functioning normally, so it will be treated as an existing pod that gets configured (neglecting the fact that you usually can't configure pods like this, but you get the point).
The ready column says how many containers are up.
The status terminating means no more traffic is being sent to that pod by the controllers. From kubernetes' docs:
When a user requests deletion of a pod, the system records the
intended grace period before the pod is allowed to be forcefully
killed, and a TERM signal is sent to the main process in each
container. Once the grace period has expired, the KILL signal is sent
to those processes, and the pod is then deleted from the API server.
That's the state it is. The containers are up, finishing processing whatever work it had already and a TERM signal was sent.
I want to update #nrxr answer:
The status terminating means no more traffic is being sent to that pod by the controllers.
That is what we want, but in reality, it not always be like that. The pod may terminate completely and the traffic still forward to it.
For detail please read this blog: https://learnk8s.io/graceful-shutdown.

Suspending a container in a kubernetes pod

I would like to suspend the main process in a docker container running in a kubernetes pod. I have attempted to do this by running
kubectl exec <pod-name> -c <container-name> kill -STOP 1
but the signal will not stop the container. Investigating other approaches, it looks like docker stop --signal=SIGSTOP or docker pause might work. However, as far as I know, kubectl exec always runs in the context of a container, and these commands would need to be run in the pod outside the context of the container. Does kubectl's interface allow for anything like this? Might I achieve this behavior through a call to the underlying kubernetes API?
You could set the replicaset to 0 which would set the number of working deployments to 0. This isn't quite a Pause but it does Stop the deployment until you set the number of deployments to >0.
kubectl scale --replicas=0 deployment/<pod name> --namespace=<namespace>
So kubernetes does not support suspending pods because it's a VM kinda behavior, and since starting a new one is cheaper it just schedules a new pod in case of failure. In effect your pods should be stateless. And any application that needs to store state, should have a persistent volume mounted inside the pod.
The simple mechanics(and general behavior) of Kubernetes is if the process inside the contaiener fails kuberentes will restart it by creating a new pod.
If you also comment what you are trying to achieve as an end goal I think I can help you better.

kubernetes pods are restarting with new ID

The pods i am working with are being managed by kubernetes. When I use the docker restart command to restart a pod, sometimes the pod gets a new id and sometimes the old one. When the pod gets a new id, its state first goes friom running ->error->crashloopbackoff. Can anyone please tell me why is this happening. Also how frequently does kubernetes does the health check
Kubernetes currently does not use the docker restart command for many reasons (e.g., preserving the logs of older containers). Kubelet, the daemon on the node, creates a new container if the existing container terminated. In any case, users should not perform container lifecycle operations (e.g., stop, restart) on kubernetes-managed containers directly using docker, as it could cause unexpected behaviors.
EDIT: If you want kubernetes to restart your container automatically, set RestartPolicy in your pod spec to "Always" or "OnFailure". For more details, see http://kubernetes.io/docs/user-guide/pod-states/