The extention of updateable fields in Pod - kubernetes

I wonder why only some fields of a pod can be modified。
The updateable fields are:
spec.containers[*].image
spec.initContainers[*].image
spec.activeDeadlineSeconds
1
In my actual business, I have a need to change the schedulername of a pod.
So i change the validating codes in the file of validation.go. And i created a second scheduler named kube-scheduler-test.
That is pod whose schedulername is kube-scheduler-test
When i created a new pod whose schedulername is kube-scheduler-test, then the kube-scheduler-test will update the schedulername of pod to default-scheduler.
And then default-scheduler will schedule this pod to specified node.
Could you explain why only some fields of a pod can be modified and whether my method of changing the schedulername acceptable or not?

Pods are designed to be disposable, think of them as immutable. When you want to change anything about a Pod, you create a new Pod, and throw away the old Pod.
There are many articles explaining immutable infrastructure.

Related

What's the exact reason a pod-template-hash is added to the name of the replicaset when a deployment is created?

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment mentions that a deployment creates a replicaSet but appends a pod-template-hash to the name of the replicaSet and also adds pod-template-hash as replicaSet's label.
my best guess is that deployment creates multiple replicaSets and this hash ensures that the replicas do not overlap. Is that correct?
Correct, the documentation states this really well:
The pod-template-hash label is added by the Deployment controller to
every ReplicaSet that a Deployment creates or adopts.
This label ensures that child ReplicaSets of a Deployment do not
overlap. It is generated by hashing the PodTemplate of the ReplicaSet
and using the resulting hash as the label value that is added to the
ReplicaSet selector, Pod template labels, and in any existing Pods
that the ReplicaSet might have.
This is necessary for a bunch of different reasons:
When you apply a new version of a Deployment, depending on how the deployment is configured and on probes, the previous Pod / Pods could stay up until the new one / ones is not Running and Ready and only then is gracefully terminated. So it may happens that Pods of different ReplicaSet (previous and current) run at the same time.
Deployment History is available to be consulted and you may also want to rollback to an older revision, should the current one stops behaving correctly (for example you changed the image that needs to be used and it jsut crash in error). Each revision has its own ReplicaSet ready to be scaled up or down as necessary as explained in the docs

Facing "The Pod "web-nginx" is invalid: spec.initContainers: Forbidden: pod updates may not add or remove containers" applying pod with initcontainers

I was trying to make file before application gets up in kubernetes cluster with initcontainers,
But when i am setting up the pod.yaml and trying to apply it with "kubectl apply -f pod.yaml" it throws below error
error-image
Like the error says, you cannot update a Pod adding or removing containers. To quote the documentation ( https://kubernetes.io/docs/concepts/workloads/pods/#pod-update-and-replacement )
Kubernetes doesn't prevent you from managing Pods directly. It is
possible to update some fields of a running Pod, in place. However,
Pod update operations like patch, and replace have some limitations
This is because usually, you don't create Pods directly, instead you use Deployments, Jobs, StatefulSets (and more) which are high-level resources that defines Pods templates. When you modify the template, Kubernetes simply delete the old Pod and then schedule the new version.
In your case:
you could delete the pod first, then create it again with the new specs you defined. But take into consideration that the Pod may be scheduled on a different node of the cluster (if you have more than one) and that may have a different IP Address as Pods are disposable entities.
Change your definition with a slightly more complex one, a Deployment ( https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ ) which can be changed as desired and, each time you'll make a change to its definition, the old Pod will be removed and a new one will be scheduled.
From the spec of your Pod, I see that you are using a volume to share data between the init container and the main container. This is the optimal way but you don't necessarily need to use a hostPath. If the only needs for the volume is to share data between init container and other containers, you can simply use emptyDir type, which acts as a temporary volume that can be shared between containers and that will be cleaned up when the Pod is removed from the cluster for any reason.
You can check the documentation here: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

How to delete a pod from Kubernetes master node?

Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )

changing Pod priority without restarting the Pod?

I am trying to change priority of an existing Kubernetes Pod using 'patch' command, but it returns error saying that this is not one of the fields that can be modified. I can patch the priority in the Deployment spec, but it would cause the Pod to be recreated (following the defined update strategy).
The basic idea is to implement a mechanism conceptually similar to nice levels (for my application), so that certain Pods can be de-prioritized based on certain conditions (by my controller), and preempted by the default scheduler in case of resource congestion. But I don't want them to be restarted if there is no congestion.
Is there a way around it, or there is something inherent in the way scheduler works that would prevent something like this from working properly?
Priority values are applied to a pod based on the priority value of the PriorityClass assigned to their deployment at the time that the pod is scheduled. Any changes made to the PriorityClass will not be applied to pods which have already been scheduled, so you would have to redeploy the pod for the priority to take effect anyway.
As far I know,
Pod priority will work on when pod is getting scheduled.
First you need to create PriorityClasses
Create Pod with priorityClassName and mention priorityclass in pod definition.
If you are trying to add priority to already scheduled pod I will not work.
For reference: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#pod-priority

Graceful shutdown of explicitly selected pod

I would like to remove a given, exact, selected pod from a set of pods controlled by the same Replication Controller, or the same Replica Set.
The use case is the following: each pod in the set runs a stateful (but in-memory) application. I would like to remove a pod from the set on a graceful way, i.e. before the removal I would like to be sure, that there are no ongoing application sessions handled by the pod. Let's say I can solve the task of emptying the pod on application level, i.e. no new sessions are directed to the selected pod, and I can measure the number of ongoing sessions in the pod, so I can decide when to remove the pod. The hard part is to remove this pod so, that RC or RS does not replace the pod with a new one based on the value of "replicas".
I could not find a solution for this. The nearest one would be to isolate the pod from the RC or RS as suggested by http://kubernetes.io/docs/user-guide/replication-controller/#isolating-pods-from-a-replication-controller
Though, the RC or RS replaces the isolated pod with a new one, according to the same document. And I as can understand there is no way to isolate the pod and decrease the value of "replicas" on an atomic way.
I have checked the coming PetSet support, but my application does not require e.g. persistent storage, or persistent pod ID. Such features are not necessary in my case, so my application is not really a pet from this perspective.
Maybe a new pod state (e.g. "target for removal" - state name is not important for me) would make it, which could be patched via the API, and which would be considered by RC or RS when the value of "replicas" is decreased?
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decrease the replicas of the Replica Set atomically. The pod you want to deleted won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.