How does k8s know which pod to update? - kubernetes

I'm currently getting started with Kubernetes, and so far, I have a question that I could not find answered anywhere.
Until know, I have learned what containers, pods, and replica sets are. I basically understand the things, but one thing I did not get is: If I update a manifest of a pod (or of a replica set), and re-POST it to k8s - how does k8s know which existing manifest this refers to?
Is this matching done by the manifest's name, i.e. by the name of the pod or the replica set? Or …?
In other words: If I update a manifest, how does k8s know that it is an updated one, and how does it detect which one is the one with the previous version?

You are right, k8s uses metadata.name for identifying resources. That name is unique per resource type (Pod/ReplicaSet/...) and namespace.

Well, for starters lets get things straight. When you update manifest, it is obvious what to update in the first place - the object you updated - ie. Deployment or ReplicaSet. Now, when that is updated, the RollingUpdate kicks in, and this is what I assume you wonder about as well as in general how ownership of pod is established. If you make a kubectl get pod -o yaml you can find a keys like ownerReferences, pod-template-hash and kubernetes.io/created-by which should be rather obvious when you see the content. In the other direction (so not from the Pod but from Deployment) you have a selector field which defines what labels are used to filter pods to find the right ones.

Related

What's the exact reason a pod-template-hash is added to the name of the replicaset when a deployment is created?

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment mentions that a deployment creates a replicaSet but appends a pod-template-hash to the name of the replicaSet and also adds pod-template-hash as replicaSet's label.
my best guess is that deployment creates multiple replicaSets and this hash ensures that the replicas do not overlap. Is that correct?
Correct, the documentation states this really well:
The pod-template-hash label is added by the Deployment controller to
every ReplicaSet that a Deployment creates or adopts.
This label ensures that child ReplicaSets of a Deployment do not
overlap. It is generated by hashing the PodTemplate of the ReplicaSet
and using the resulting hash as the label value that is added to the
ReplicaSet selector, Pod template labels, and in any existing Pods
that the ReplicaSet might have.
This is necessary for a bunch of different reasons:
When you apply a new version of a Deployment, depending on how the deployment is configured and on probes, the previous Pod / Pods could stay up until the new one / ones is not Running and Ready and only then is gracefully terminated. So it may happens that Pods of different ReplicaSet (previous and current) run at the same time.
Deployment History is available to be consulted and you may also want to rollback to an older revision, should the current one stops behaving correctly (for example you changed the image that needs to be used and it jsut crash in error). Each revision has its own ReplicaSet ready to be scaled up or down as necessary as explained in the docs

how to take a problematic pod offline to troubleshoot

HI I know there's a way i can pull out a problematic node out of loadbalancer to troubleshoot. But how can i pull a pod out of service to troubleshoot. What tools or command can do it ?
Change its labels so they no longer matches the selector: in the Service; we used to do that all the time. You can even put it back into rotation if you want to test a hypothesis. I don't recall exactly how quickly it takes effect, but I would guess "real quick" is a good approximation. :-)
## for example:
$ kubectl label pod $the_pod -app.kubernetes.io/name
## or, change it to non-matching
$ kubectl label pod $the_pod app.kubernetes.io/name=i-am-debugging-this-pod
As mentioned in Oreilly's "Kubernetes recipes: Maintenance and troubleshooting" page here
Removing a Pod from a Service
Problem
You have a well-defined service (see not available) backed by several
pods. But one of the pods is misbehaving, and you would like to take
it out of the list of endpoints to examine it at a later time.
Solution
Relabel the pod using the --overwrite option—this will allow you to
change the value of the run label on the pod. By overwriting this
label, you can ensure that it will not be selected by the service
selector (not available) and will be removed from the list of
endpoints. At the same time, the replica set watching over your pods
will see that a pod has disappeared and will start a new replica.
To see this in action, start with a straightforward deployment
generated with kubectl run (see not available):
For commands, check the recipes page mentioned above. There is also a section talking about "Debugging Pods" which will be helpful

Editing Kubernetes pod on-the-fly

For the debug and testing purposes I'd like to find a most convenient way launching Kubernetes pods and altering its specification on-the-fly.
The launching part is quite easy with imperative commands.
Running
kubectl run nginx-test --image nginx --restart=Never
gives me exactly what I want: the single pod not managed by any controller like Deployment or ReplicaSet. Easy to play with and cleanup when it needed.
However when I'm trying to edit the spec with
kubectl edit po nginx-test
I'm getting the following warning:
pods "nginx-test" was not valid:
* spec: Forbidden: pod updates may not change fields other than spec.containers[*].image, spec.initContainers[*].image, spec.activeDeadlineSeconds or spec.tolerations (only additions to existing tolerations)
i.e. only the limited set of Pod spec is editable at runtime.
OPTIONS FOUND SO FAR:
Getting Pod spec saved into the file:
kubectl get po nginx-test -oyaml > nginx-test.yaml
edited and recreated with
kubectl apply -f
A bit heavy weight for changing just one field though.
Creating a Deployment not single Pod and then editing spec section in Deployment itself.
The cons are:
additional API object needed (Deployment) which you should not forget to cleanup when you are done
the Pod names are autogenerated in the form of nginx-test-xxxxxxxxx-xxxx and less
convenient to work with.
So is there any simpler option (or possibly some elegant workaround) of editing arbitrary field in the Pod spec?
I would appreciate any suggestion.
You should absolutely use a Deployment here.
For the use case you're describing, most of the interesting fields on a Pod cannot be updated, so you need to manually delete and recreate the pod yourself. A Deployment manages that for you. If a Deployment owns a Pod, and you delete the Deployment, Kubernetes knows on its own to delete the matching Pod, so there's not really any more work.
(There's not really any reason to want a bare pod; you almost always want one of the higher-level controllers. The one exception I can think of is kubectl run a debugging shell inside the cluster.)
The Pod name being generated can be a minor hassle. One trick that's useful here: as of reasonably recent kubectl, you can give the deployment name to commands like kubectl logs
kubectl logs deployment/nginx-test
There are also various "dashboard" type tools out there that will let you browse your current set of pods, so you can do things like read logs without having to copy-and-paste the full pod name. You may also be able to set up tab completion for kubectl, and type
kubectl logs nginx-test<TAB>

kubernetes: specifying maxUnavailable in both Deployment and PDB

Assuming I have a Deployment with a specific value set to the .spec.strategy.rollingUpdate.maxUnavailable field.
Then I deploy a PodDisruptionBudget attached to the deployment above, setting its spec.maxUnavailable field to a value different to the above.
Which one will prevail?
By interpreting the documentation, it seems that it depends on the event.
For a rolling update, the Deployment's maxUnavailable will be in effect, even if the PodDisruptionBudget specifies a smaller value.
But for an eviction, the PodDisruptionBudget's maxUnavailable will prevail, even if the Deployment specifies a smaller value.
The documentation does not explicitly compare these two settings, but from the way the documentation is written, it can be deduced that these are separate settings for different events that don't interact with each other.
For example:
Updating a Deployment
Output of kubectl explain deploy.spec.strategy.rollingUpdate.maxUnavailable
Specifying a PodDisruptionBudget
Output of kubectl explain pdb.spec.maxUnavailable
Also, this is more in the spirit of how Kubernetes works. The Deployment Controller is not going to read a field of a PodDisruptionBudget, and vice versa.
But to be really sure, you would just need to try it out.
I believe they updated the docs clarifying your doubt:
Involuntary disruptions cannot be prevented by PDBs; however they do count against the budget.
Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget, but workload resources (such as Deployment and StatefulSet) are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures during application updates is configured in the spec for the specific workload resource
Caution: Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example, deleting deployments or pods bypasses Pod Disruption Budgets.

Is there the concept of uploading a Deployment without causing pods to start?

(I am (all things considered) a Kubernetes rookie.)
I know that kubectl create -f myDeployment.yaml will send my deployment specification off to the cluster to be reified, and if it says to start three replicas of its contained pod template then Kubernetes will set about starting up three pods.
I wonder: is there a Kubernetes concept or practice of somehow uploading the deployment for reference later and then "activating" it later? Perhaps by, say, changing replicas from zero to some positive number? If this is not a meaningful question, or this isn't the Right Way To Think About Things, I'd appreciate pointers as well.
I don't think you idea would work well with Kubernetes. Firstly, there so no way of "pausing" a Deployment or any other ReplicationController or ReplicaSet, besides setting the replicas to 0, as you mentioned.
The next issue is, that the YAML you would get from the apiserver isn't the same as you created. The controller manager adds some annotations, default values and statuses. So it would be hard to verify the Deployment that way.
IMO a better way to verify Deployments is to add them to a version control system and peer-review the YAML files. Then you can create or update is on the apiserver with kubectl apply -f myDeployment.yaml. If the Deployment is wrong in term of syntax, then kubectl will complain about it and you could patch the Deployment accordingly. This also simplifies the update procedure of Deployments.
Deployment can be paused, please refer https://kubernetes.io/docs/user-guide/deployments/#pausing-and-resuming-a-deployment , or see information with kubectl rollout pause -h.
You can adjust replicas of a paused deployment, but changes on pod template will not trigger a rollout. If the deployment is paused in the middle of a rollout, then it will not continue until you resume it.