I would like to remove a given, exact, selected pod from a set of pods controlled by the same Replication Controller, or the same Replica Set.
The use case is the following: each pod in the set runs a stateful (but in-memory) application. I would like to remove a pod from the set on a graceful way, i.e. before the removal I would like to be sure, that there are no ongoing application sessions handled by the pod. Let's say I can solve the task of emptying the pod on application level, i.e. no new sessions are directed to the selected pod, and I can measure the number of ongoing sessions in the pod, so I can decide when to remove the pod. The hard part is to remove this pod so, that RC or RS does not replace the pod with a new one based on the value of "replicas".
I could not find a solution for this. The nearest one would be to isolate the pod from the RC or RS as suggested by http://kubernetes.io/docs/user-guide/replication-controller/#isolating-pods-from-a-replication-controller
Though, the RC or RS replaces the isolated pod with a new one, according to the same document. And I as can understand there is no way to isolate the pod and decrease the value of "replicas" on an atomic way.
I have checked the coming PetSet support, but my application does not require e.g. persistent storage, or persistent pod ID. Such features are not necessary in my case, so my application is not really a pet from this perspective.
Maybe a new pod state (e.g. "target for removal" - state name is not important for me) would make it, which could be patched via the API, and which would be considered by RC or RS when the value of "replicas" is decreased?
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decrease the replicas of the Replica Set atomically. The pod you want to deleted won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.
Related
When a pod is in a restart loop is it eligible for being removed during scaling down before it restarts successfully? (without stateful sets)
Also what happens if a pod container exits with a non-zero exit code when scaling that pod down? Will it be restarted and shutdown again or just removed? (with or without stateful sets)
Can I ensure that a pod is always gracefully shutdown without using stateful sets (because I want lifetime-unique UIDs instead of distinct reusable ordinal ids)?
Can I ensure that a pod is always gracefully shutdown without using stateful sets (because I want lifetime-unique UIDs instead of distinct reusable ordinal ids)?
Pods which are part of Job or Cronjob resources will run until all of the containers in the pod complete. However, the Linkerd proxy container runs continuously until it receives a TERM signal. Since Kubernetes does not give the proxy a means to know when the Cronjob has completed, by default, Job and Cronjob pods which have been meshed will continue to run even once the main container has completed.
it means we can stop graecfull shutdown
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decreasing the replicas of the Replica Set atomically. The pod you want to delete won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.
Fore more information refer to these documents.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment mentions that a deployment creates a replicaSet but appends a pod-template-hash to the name of the replicaSet and also adds pod-template-hash as replicaSet's label.
my best guess is that deployment creates multiple replicaSets and this hash ensures that the replicas do not overlap. Is that correct?
Correct, the documentation states this really well:
The pod-template-hash label is added by the Deployment controller to
every ReplicaSet that a Deployment creates or adopts.
This label ensures that child ReplicaSets of a Deployment do not
overlap. It is generated by hashing the PodTemplate of the ReplicaSet
and using the resulting hash as the label value that is added to the
ReplicaSet selector, Pod template labels, and in any existing Pods
that the ReplicaSet might have.
This is necessary for a bunch of different reasons:
When you apply a new version of a Deployment, depending on how the deployment is configured and on probes, the previous Pod / Pods could stay up until the new one / ones is not Running and Ready and only then is gracefully terminated. So it may happens that Pods of different ReplicaSet (previous and current) run at the same time.
Deployment History is available to be consulted and you may also want to rollback to an older revision, should the current one stops behaving correctly (for example you changed the image that needs to be used and it jsut crash in error). Each revision has its own ReplicaSet ready to be scaled up or down as necessary as explained in the docs
Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )
I want an application to pull an item off a queue, process the item on the queue and then destroy itself. Pull -> Process -> Destroy.
I've looked at using the job pattern Queue with Pod Per Work Item as that fits the usecase however it isn't appropriate when I need the job to autoscale aka 0/1 pods when queue is empty and scale to a point when items are added. The only way I can see doing this is via a deployment but that removes the pattern of Queue with Pod per Work Item. There must be a fresh container per item.
Is there a way to have the job pattern Queue with Pod Per Work Item but with auto-scaling?
I am a bit confused, so I'll just say this: if you don't mind a failed pod, and you wish that a failed pod will not be recreated by Kubernetes, you can do that in your code by catching all errors and exiting gracefully (not advised).
Please also note, that for deployments, the only accepted restartPolicy is always. So pods of a deployments who crash will always be restarted by Kubernetes, and will probably fails for the same reason, leading to a CrashLoopBackOff.
If you want to scale a deployment depending on the length of a RabbitMQ queue's length, check KEDA. It is an event-driven autoscaling platform.
Make sure to also check their example with RabbitMQ
Another possibility is a job/deployment, that routinely checks the length of the queue in question and executes kubectl commands to scale your deployment.
Here is the cleanest one I could find, at least for my taste
I am trying to change priority of an existing Kubernetes Pod using 'patch' command, but it returns error saying that this is not one of the fields that can be modified. I can patch the priority in the Deployment spec, but it would cause the Pod to be recreated (following the defined update strategy).
The basic idea is to implement a mechanism conceptually similar to nice levels (for my application), so that certain Pods can be de-prioritized based on certain conditions (by my controller), and preempted by the default scheduler in case of resource congestion. But I don't want them to be restarted if there is no congestion.
Is there a way around it, or there is something inherent in the way scheduler works that would prevent something like this from working properly?
Priority values are applied to a pod based on the priority value of the PriorityClass assigned to their deployment at the time that the pod is scheduled. Any changes made to the PriorityClass will not be applied to pods which have already been scheduled, so you would have to redeploy the pod for the priority to take effect anyway.
As far I know,
Pod priority will work on when pod is getting scheduled.
First you need to create PriorityClasses
Create Pod with priorityClassName and mention priorityclass in pod definition.
If you are trying to add priority to already scheduled pod I will not work.
For reference: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#pod-priority