how to take a problematic pod offline to troubleshoot - kubernetes

HI I know there's a way i can pull out a problematic node out of loadbalancer to troubleshoot. But how can i pull a pod out of service to troubleshoot. What tools or command can do it ?

Change its labels so they no longer matches the selector: in the Service; we used to do that all the time. You can even put it back into rotation if you want to test a hypothesis. I don't recall exactly how quickly it takes effect, but I would guess "real quick" is a good approximation. :-)
## for example:
$ kubectl label pod $the_pod -app.kubernetes.io/name
## or, change it to non-matching
$ kubectl label pod $the_pod app.kubernetes.io/name=i-am-debugging-this-pod

As mentioned in Oreilly's "Kubernetes recipes: Maintenance and troubleshooting" page here
Removing a Pod from a Service
Problem
You have a well-defined service (see not available) backed by several
pods. But one of the pods is misbehaving, and you would like to take
it out of the list of endpoints to examine it at a later time.
Solution
Relabel the pod using the --overwrite option—this will allow you to
change the value of the run label on the pod. By overwriting this
label, you can ensure that it will not be selected by the service
selector (not available) and will be removed from the list of
endpoints. At the same time, the replica set watching over your pods
will see that a pod has disappeared and will start a new replica.
To see this in action, start with a straightforward deployment
generated with kubectl run (see not available):
For commands, check the recipes page mentioned above. There is also a section talking about "Debugging Pods" which will be helpful

Related

How to monitor pod preemption event

I have a bunch of Rancher clusters I take care of and on some of them developers use PriorityClasses to ensure that some of the more important workloads get scheduled. The 3 PriorityClasses are in 3 digits range so they will not interfere with the default ones. However, at present none of the PriorityClasses is set as default and neither is the preemptionPolicy set so it defaults to PreemptLowerPriority.
None of the rancher, longhorn, prometheus, grafana, etc., workloads have priorityClassName set.
Long story short, I believe this causes havoc on the cluster when resources are in short supply.
Before I take my opinion to the developers I would like to collect some data to back up my story.
The question: How do I detect if the pod was Terminated due to Preemption?
I tried to google the subject but couldn't find anything. I was hoping kube state metrics would have something but I didn't find anything.
Any help would be greatly appreciated.
You can try to look for convincing data like the pod termination reason with help of kubectl.
You can see the last restart logs of a container using the following command:
kubectl logs podname -c containername --previous
You can also use the following command to check the lifecycle events sent by the kubelet to the apiserver about the pod.
kubectl describe pod podname
Finally, You can also write a final message to /dev/termination-log, and this will show up as described in the docs.
To use kubectl commands with rancher kindly refer to this documentation page.

Editing Kubernetes pod on-the-fly

For the debug and testing purposes I'd like to find a most convenient way launching Kubernetes pods and altering its specification on-the-fly.
The launching part is quite easy with imperative commands.
Running
kubectl run nginx-test --image nginx --restart=Never
gives me exactly what I want: the single pod not managed by any controller like Deployment or ReplicaSet. Easy to play with and cleanup when it needed.
However when I'm trying to edit the spec with
kubectl edit po nginx-test
I'm getting the following warning:
pods "nginx-test" was not valid:
* spec: Forbidden: pod updates may not change fields other than spec.containers[*].image, spec.initContainers[*].image, spec.activeDeadlineSeconds or spec.tolerations (only additions to existing tolerations)
i.e. only the limited set of Pod spec is editable at runtime.
OPTIONS FOUND SO FAR:
Getting Pod spec saved into the file:
kubectl get po nginx-test -oyaml > nginx-test.yaml
edited and recreated with
kubectl apply -f
A bit heavy weight for changing just one field though.
Creating a Deployment not single Pod and then editing spec section in Deployment itself.
The cons are:
additional API object needed (Deployment) which you should not forget to cleanup when you are done
the Pod names are autogenerated in the form of nginx-test-xxxxxxxxx-xxxx and less
convenient to work with.
So is there any simpler option (or possibly some elegant workaround) of editing arbitrary field in the Pod spec?
I would appreciate any suggestion.
You should absolutely use a Deployment here.
For the use case you're describing, most of the interesting fields on a Pod cannot be updated, so you need to manually delete and recreate the pod yourself. A Deployment manages that for you. If a Deployment owns a Pod, and you delete the Deployment, Kubernetes knows on its own to delete the matching Pod, so there's not really any more work.
(There's not really any reason to want a bare pod; you almost always want one of the higher-level controllers. The one exception I can think of is kubectl run a debugging shell inside the cluster.)
The Pod name being generated can be a minor hassle. One trick that's useful here: as of reasonably recent kubectl, you can give the deployment name to commands like kubectl logs
kubectl logs deployment/nginx-test
There are also various "dashboard" type tools out there that will let you browse your current set of pods, so you can do things like read logs without having to copy-and-paste the full pod name. You may also be able to set up tab completion for kubectl, and type
kubectl logs nginx-test<TAB>

Is there a way to make kubectl apply restart deployments whose image tag has not changed?

I've got a local deployment system that is mirroring our production system. Both are deployed by calling kubectl apply -f deployments-and-services.yaml
I'm tagging all builds with the current git hash, which means that for clean deploys to GKE, all the services have a new docker image tag which means that apply will restart them, but locally to minikube the tag is often not changing which means that new code is not run. Before I was working around this by calling kubectl delete and then kubectl create for deploying to minikube, but as the number of services I'm deploying has increased, that is starting to stretch the dev cycle too far.
Ideally, I'd like a better way to tell kubectl apply to restart a deployment rather than just depending on the tag?
I'm curious how people have been approaching this problem.
Additionally, I'm building everything with bazel which means that I have to be pretty explicit about setting up my build commands. I'm thinking maybe I should switch to just delete/creating the one service I'm working on and leave the others running.
But in that case, maybe I should just look at telepresence and run the service I'm dev'ing on outside of minikube all together? What are best practices here?
I'm not entirely sure I understood your question but that may very well be my reading comprehension :)
In any case here's a few thoughts that popped up while reading this (again not sure what you're trying to accomplish)
Option 1: maybe what you're looking for is to scale down and back up, i.e. scale your deployment to say 0 and then back up, given you're using configmap and maybe you only want to update that, the command would be kubectl scale --replicas=0 -f foo.yaml and then back to whatever
Option 2: if you want to apply the deployment and not kill any pods for example, you would use the cascade=false (google it)
Option 3: lookup the rollout option to manage deployments, not sure if it works on services though
Finally, and that's only me talking, share some more details like which version of k8s are you using? maybe provide an actual use case example to better describe the issue.
Kubernetes, only triggers a deployment when something has changed, if you have image pull policy to always you can delete your pods to get the new image, if you want kube to handle the deployment you can update the kubernetes yaml file to container a constantly changing metadata field (I use seconds since epoch) which will trigger a change. Ideally you should be tagging your images with unique tags from your CI/CD pipeline with the commit reference they have been built from. this gets around this issue and allows you to take full advantage of the kubernetes rollback feature.

pods keep creating themselves even I deleted all deployments

I am running k8s on aws, and I updated the deployment of nginx - which normally, it works fine-, but after this time, the nginx deployment won't show up in "kubectl get deployments".
I want to kill all the pods related to nginx, but they keep reproduce themselves. I deleted all deployments "kubectl delete --all deployments", other pods just got terminated, but not nginx.
I have no idea where I can stop the pods recreating.
any idea where to start ?
check the deployment, replication controller and replica set and remove them.
kubectl get deploy,rc,rs
In modern kubernetes, there is also an annotation kubernetes.io/created-by on the Pod showing its "owner", as seen here, but I can't lay my hands on the documentation link right now. However, I found a pastebin containing a concrete example of the contents of the annotation

unsupport for `kubectl delete <pod ip>` in kubernetes UX

It would be great if we have
kubectl delete pod/<pod-ip>
Is there any particular reason to avoid this UX?
The best way to manage deletions is by label, you can always do kubectl label pod foo ip=1.2.3.4 and then kubectl delete pod -l ip=1.2.3.4 if you really want to target the ip.
Some context: Addressing a pod by ip supported out of the box because it would encourage higher level control systems (either automated or cluster admins) that hinge on IP, and IPs of containers are supposed to be ephemeral. One might draw an analogy to a stack frame in C, an un-initialized variable might have the same value on second entry into a function, but that doesn't mean one should count on that value to always be the same.