Ensuring graceful shutdown of a pod without ordinal pod ids - kubernetes

When a pod is in a restart loop is it eligible for being removed during scaling down before it restarts successfully? (without stateful sets)
Also what happens if a pod container exits with a non-zero exit code when scaling that pod down? Will it be restarted and shutdown again or just removed? (with or without stateful sets)
Can I ensure that a pod is always gracefully shutdown without using stateful sets (because I want lifetime-unique UIDs instead of distinct reusable ordinal ids)?

Can I ensure that a pod is always gracefully shutdown without using stateful sets (because I want lifetime-unique UIDs instead of distinct reusable ordinal ids)?
Pods which are part of Job or Cronjob resources will run until all of the containers in the pod complete. However, the Linkerd proxy container runs continuously until it receives a TERM signal. Since Kubernetes does not give the proxy a means to know when the Cronjob has completed, by default, Job and Cronjob pods which have been meshed will continue to run even once the main container has completed.
it means we can stop graecfull shutdown
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decreasing the replicas of the Replica Set atomically. The pod you want to delete won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.
Fore more information refer to these documents.

Related

Job still running even when deleting nodes

I created a two nodes clusters and I created a new job using the busybox image that sleeps for 300 secs. I checked on which node this job is running using
kubectl get pods -o wide
I deleted the node but surprisingly the job was still finishing to run on the same node. Any idea if this is a normal behavior? If not how can I fix it?
Jobs aren't scheduled or running on nodes. The role of a job is just to define a policy by making sure that a pod with certain specifications exists and ensure that it runs till the completion of the task whether it completed successfully or not.
When you create a job, you are declaring a policy that the built-in job-controller will see and will create a pod for. Then the built-in kube-scheduler will see this pod without a node and patch the pod to it with a node's identity. The kubelet will see a pod with a node matching it's own identity and hence a container will be started. As the container will be still running, the control-plane will know that the node and the pod still exist.
There are two ways of breaking a node, one with a drain and the second without a drain. The process of breaking a node without draining is identical to a network cut or a server crash. The api-server will keep the node resource for a while, but it 'll cease being Ready. The pods will be then terminated slowly. However, when you drain a node, it looks as if you are preventing new pods from scheduling on to the node and deleting the pods using kubectl delete pod.
In both ways, the pods will be deleted and you will be having a job that hasn't run to completion and doesn't have a pod, therefore job-controller will make a new pod for the job and the job's failed-attempts will be increased by 1, and the loop will start over again.

How to delete a pod from Kubernetes master node?

Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )

Set a Pod as owner reference for another pod in client go program

Is it possible to set a running pod as owner of another pod which is to be created. I tired but in that case pod creation fails.
This is not directly supported by Kubernetes. When you have a Pod that depends on the existence of another one (e.g. needs a Database or similar), you could use a Init Container. This will delay the container start until the init container finishes. This is a good way to apply e.g. waiting conditions.
I think you can use Kubernetes Jobs.
A Job creates one or more Pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up the Pods it created.
A simple case is to create one Job object in order to reliably run one Pod to completion. The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot).
More information you can find here: jobs-kubernetes.

Kubernetes Queue with Pod Per Work Item autoscaling

I want an application to pull an item off a queue, process the item on the queue and then destroy itself. Pull -> Process -> Destroy.
I've looked at using the job pattern Queue with Pod Per Work Item as that fits the usecase however it isn't appropriate when I need the job to autoscale aka 0/1 pods when queue is empty and scale to a point when items are added. The only way I can see doing this is via a deployment but that removes the pattern of Queue with Pod per Work Item. There must be a fresh container per item.
Is there a way to have the job pattern Queue with Pod Per Work Item but with auto-scaling?
I am a bit confused, so I'll just say this: if you don't mind a failed pod, and you wish that a failed pod will not be recreated by Kubernetes, you can do that in your code by catching all errors and exiting gracefully (not advised).
Please also note, that for deployments, the only accepted restartPolicy is always. So pods of a deployments who crash will always be restarted by Kubernetes, and will probably fails for the same reason, leading to a CrashLoopBackOff.
If you want to scale a deployment depending on the length of a RabbitMQ queue's length, check KEDA. It is an event-driven autoscaling platform.
Make sure to also check their example with RabbitMQ
Another possibility is a job/deployment, that routinely checks the length of the queue in question and executes kubectl commands to scale your deployment.
Here is the cleanest one I could find, at least for my taste

Graceful shutdown of explicitly selected pod

I would like to remove a given, exact, selected pod from a set of pods controlled by the same Replication Controller, or the same Replica Set.
The use case is the following: each pod in the set runs a stateful (but in-memory) application. I would like to remove a pod from the set on a graceful way, i.e. before the removal I would like to be sure, that there are no ongoing application sessions handled by the pod. Let's say I can solve the task of emptying the pod on application level, i.e. no new sessions are directed to the selected pod, and I can measure the number of ongoing sessions in the pod, so I can decide when to remove the pod. The hard part is to remove this pod so, that RC or RS does not replace the pod with a new one based on the value of "replicas".
I could not find a solution for this. The nearest one would be to isolate the pod from the RC or RS as suggested by http://kubernetes.io/docs/user-guide/replication-controller/#isolating-pods-from-a-replication-controller
Though, the RC or RS replaces the isolated pod with a new one, according to the same document. And I as can understand there is no way to isolate the pod and decrease the value of "replicas" on an atomic way.
I have checked the coming PetSet support, but my application does not require e.g. persistent storage, or persistent pod ID. Such features are not necessary in my case, so my application is not really a pet from this perspective.
Maybe a new pod state (e.g. "target for removal" - state name is not important for me) would make it, which could be patched via the API, and which would be considered by RC or RS when the value of "replicas" is decreased?
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decrease the replicas of the Replica Set atomically. The pod you want to deleted won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.