How to clear CrashLoopBackOff - kubernetes

When a Kubernetes pod goes into CrashLoopBackOff state, you will fix the underlying issue. How do you force it to be rescheduled?

For apply new configuration the new pod should be created (the old one will be removed).
If your pod was created automatically by Deployment or DaemonSet resource, this action will run automaticaly each time after you update resource's yaml.
It is not going to happen if your resource have spec.updateStrategy.type=OnDelete.
If problem was connected with error inside docker image, that you solved, you should update pods manually, you can use rolling-update feature for this purpose, In case when new image have same tag, you can just remove broken pod. (see below)
In case of node failure, the pod will recreated on new node after few time, the old pod will be removed after full recovery of broken node. worth noting it is not going to happen if your pod was created by DaemonSet or StatefulSet.
Any way you can manual remove crashed pod:
kubectl delete pod <pod_name>
Or all pods with CrashLoopBackOff state:
kubectl delete pod `kubectl get pods | awk '$3 == "CrashLoopBackOff" {print $1}'`
If you have completely dead node you can add --grace-period=0 --force options for remove just information about this pod from kubernetes.

Generally a fix requires you to change something about the configuration of the pod (the docker image, an environment variable, a command line flag, etc), in which case you should remove the old pod and start a new pod. If your pod is running under a replication controller (which it should be), then you can do a rolling update to the new version.

5 Years later, unfortunately, this scenario seems to still be the case.
#kvaps answer above suggested an alternative (rolling updates), that essentially updates(overwrites) instead of deleting a pod -- the current working link of rolling updates
The alternative to being able to delete a pod, was NOT to create a pod but instead create a deployment, and delete the deployment that contains the pod, subject to deletion.
$ kubectl get deployments -A
$ kubectl delete -n <NAMESPACE> deployment <DEPLOYMENT>
# When on minikube or using docker for development + testing
$ docker system prune -a
The first command displays all deployments, alongside their respective namespaces. This helped me reduce the error of deleting deployments that share the same name(name collision) but from two different namespaces.
The second command deletes a deployment that is exactly located underneath a namespace.
The last command helps when working in development mode. Essentially, removing all unused images, which is not required but helps clean up and save some disk-space.
Another great tip, is to try to understand the reasons why a Pod is failing. The problem may be relying completely somewhere else, and k8s does a good deal of documenting. For that one of the following may help:
$ kubectl logs -f <POD NAME>
$ kubectl get events
Other reference here on StackOveflow:
https://stackoverflow.com/a/55647634/132610

For anyone interested I wrote a simple helm chart and python script which watches the current namespace and deletes any pod that enters CrashLoopBackOff.
The chart is at https://github.com/timothyclarke/helm-charts/tree/master/charts/dr-abc.
This is a sticking plaster. Fixing the problem is always the best option. In my specific case getting the historic apps into K8s so the development teams have a common place to work and strangle the old applications with new ones is preferable to fixing all the bugs in the old apps. Having this in the namespace to keep the illusion of everything running buys that time.

This command will delete all pods that are in any of (CrashLoopBackOff, Init:CrashLoopBackOff, etc.) states. You can use grep -i <keyword> to match different states and then delete the pods that match the state. In your case it should be:
kubectl get pod -n <namespace> --no-headers | grep -i crash | awk '{print $1}' | while read line; do; kubectl delete pod -n <namespace> $line; done

Related

unable to create a pv due to VOL_DIR: parameter not set

I'm running rke2 version v1.22.7+rke2r2 in 3 nodes. Today I decide to reinstall my application and I'm not able to do it anymore due to a problem in claiming PV.
I have had never this problems before, and I think is due to an update on local-path-provisioner but I'm not sure I'm still a newbie about kube.
Anyway these are the commands I run before installing my solution:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
I omitted metallb. Then as a test I try to install the test specified in the local-path-provisioner website (https://github.com/rancher/local-path-provisioner):
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pvc/pvc.yaml
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pod/pod.yaml
What I see is that the pvc stays in a PENDING status, then I check the pod creation in local-path-storage namespace and I see that the helper-pod-create-pvc-xxxx goes in error.
I try to get some logs and the only thing I was able to grab is this:
kubectl -n local-path-storage logs helper-pod-create-pvc-dd8cecf3-d65b-48f7-9e04-d56a20573f8e -f
/script/setup: line 3: VOL_DIR: parameter not set
So it seems VOL_DIR is not set for whatever reason. But I never did a custom configuration, it always starts without problem, and to be honest I don't know what put in VOL_DIR env variable and where.
I just answer to my question. It seems to be a bug on local-path-provisioner
they are fixing it.
In the meantime, instead of using the last one present in the master that has the bug, please use 0.0.21, like this:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.21/deploy/local-path-storage.yaml
I tested and it works fine.
The deploy manifest in master branch is already fixed.
The master branch is for development, so please use the v0.0.x (e.g v0.0.21, stable release) for production use.

Kubernetes Pod will restart after deleting

I created a pod that I want to delete but it does not seem te work.
I did all ready tried a lot of different things.
I do not have deployments or replica sets, as a lot of people suggest to delete those.
When i type:
kubectl get all
Only the pod and the service is visible, both of which will return upon deleting them.
When i type:
kubectl delete pods <pod-name> --grace-period=0 --force
The pod will come back again.
Is there something else I can do?
If pod not part of the deployment, rs, sts, ds, then it must be part of the static pod. Static pod name combines with node (i.e staticpod-node01). By default location of the static path is /etc/Kubernetes/manifest/.
Doc: https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/
Apparently, it was some sort of invisible job that started the pod.
It was not showing when entering kubectl get jobs
So when finding this job and deleting it the pod disappeared!

How to find the previous pod names after redeployment in kubernetes?

I redeployed my application with helm upgrade, e.g.
helm upgrade myapp helm/myapp --install --namespace=myapp --set image=myapp:latest
Then with
kubectl get pods
I can see new pods get started and old pods get terminated. I wonder once the old pods are terminated, and gone from kubectl get pods, can I still find their pod names somehow?
Nowadays $ kubectl get pods -a is deprecated, and as a result you cannot get deleted pods.
However you can get a list of recently deleted pod names - up to 1 hour in the past unless you changed the ttl for kubernetes events - by executing command:
$ kubectl get event -o custom-columns=NAME:.metadata.name | cut -d "." -f1
Take a look: deleted-pods.
For future consider auditing. Kubernetes auditing gives a security chronological set of records documenting the sequence of activities that have affected system by administrators, users or other components of the system.

What is the way to make kubernetes nodes have `providerID` spec after creation (manually)?

I'm expecting that kubectl get nodes <node> -o yaml to show the spec.providerID (see reference below) once the kubelet has been provided the additional flag --provider-id=provider://nodeID. I've used /etc/default/kubelet file to add more flags to the command line when kubelet is start/restarted. (On a k8s 1.16 cluster) I see the additional flags via a systemctl status kubelet --no-pager call, so the file is respected.
However, I've not seen the value get returned by kubectl get node <node> -o yaml call. I was thinking it had to be that the node was already registered, but I think kubectl re-registers when it starts up. I've seen the log line via journalctl -u kubelet suggest that it has gone through registration.
How can I add a provider ID to a node manually?
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#nodespec-v1-core
How a kubelet is configured on the node itself is separate (AFAIK) from its definition in the master control plane, which is responsible for updating state in the central etcd store; so it's possible for these to fall out of sync. i.e., you need to communicate to the control place to update its records.
In addition to Subramanian's suggestion, kubectl patch node would also work, and has the added benefit of being easily reproducible/scriptable compared to manually editing the YAML manifest; it also leaves a "paper trail" in your shell history should you need to refer back. Take your pick :) For example,
$ kubectl patch node my-node -p '{"spec":{"providerID":"foo"}}'
node/my-node patched
$ kubectl describe node my-node | grep ProviderID
ProviderID: foo
Hope this helps!
You can edit the node config and append providerID information under spec section.
kubectl edit node <Node Name>
...
spec:
podCIDR:
providerID:

Reload Kubernetes ReplicationController to get newly created Service

Is there a way to reload currently running pods created by replicationcontroller to reapply newly created services?
Example:
I have a running pods created by ReplicationController config file. I have deleted a service called mongo-svc and recreated it again using different port. Is there a way for the pod's env file to be updated with the new IP and ports from the new mongo-svc?
You can restart pods by simply deleting them: if they are linked to a Replication controller, the RC will take care of restarting them
kubectl delete pod <your-pod-name>
if you have a couple pods, it's easy enougth to copy/paste the pod names, but if you have many pods it can become cumbersome.
So another way to delete pods and restart them is to scale the RC down to 0 instances and back up to the number you need.
kubectl scale --replicas=0 rc <your-rc>
kubectl scale --replicas=<n> rc <your-rc>
By-the-way, you may also want to look at 'rolling-updates' to do this in a more production friendly manner, but that implies updating the RC config.
If you want the same pod to have the new service, the clean answer is no. You could (I strongly suggest not to do this) run kubectl exec <pod-name> -c <containers> -- export <service env var name>=<service env var value>. But your best bet is to run kubectl delete <pod-name> and let your replication controller handle the work.
I've ran into a similar issue for services being ran outside of kubernetes, say a DB for instance, to address this I've been creating this https://github.com/cpg1111/kubongo which updates the service's endpoint without deleting the pods. That same idea can also be applied to other pods in kubernetes to automate the service update. Basically it watches a specific service, and when it's IP changes for whatever reason it updates all the pods without deleting them. This does use the same code as kubectl exec however it is automated, sanitizes input and ensures the export is executed on all pods.
What do you mean with 'reapply'?
The pods to which the services point are generally selected based on labels.In other words, you can add / remove labels from the pods to include / exclude them from a service.
Read here for more information about defining services: http://kubernetes.io/v1.1/docs/user-guide/services.html#defining-a-service
And here for more information about labels: http://kubernetes.io/v1.1/docs/user-guide/labels.html
Hope it helps!