I've started experimenting with Argocd as part of my cluster setup and set it up to watch a test repo containing some yaml files for a small application I wanted to use for the experiment. While getting to know the system a bit, I broke the repo connection and instead of fixing it I decided that I had what I wanted, and decided to do a clean install with the intention of configuring it towards my actual project.
I pressed the button in the web UI for deleting the application, which got stuck. After which I read that adding spec.syncPolicy.allowEmpty: true and removing the metadata.finalizers declaration from the application yaml file. This did not allow me to remove the application resource.
I then ran an uninstall command with the official manifests/install.yaml as an argument, which cleaned up most resources installed, but left the application resource and the namespace. Command: kubectl delete -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Have tried to use the kubectl delete application NAME --force flag and the --cascade=orphans flag on the application resource as well as on the argocd namespace itself. Now I have both of them stuck at terminating without getting any further.
Now I'm proper stuck as I can't reinstall the argocd in any way I know due to the resources and namespace being marked for deletion, and I'm at my wits end as to what else I can try in order to get rid of the dangling application resource.
Any and all suggestions as to what to look into is much appreciated.
If your problem is that the namespace cannot be deleted, the following two solutions may help you:
Check what resources are stuck in the deletion process, delete these resources, and then delete ns
Edit the namespace of argocd, check if there is a finalizer field in the spec, delete that field and the content of the field
Hopefully it helped you.
I've found that using the following commands help greatly...
kubectl api-resources --verbs=list --namespaced -o name | \
xargs -n 1 kubectl get --show-kind \
--ignore-not-found -n <namespace>
kubectl api-resources -n <namespace> | grep argo | grep ...
...help greatly to identify the resources that are "stuck".
Then you have to either use some awk to generate delete or delete --all to "prune" the resources. If some get stuck, then you have to resort to editing them to remove the finalisers so that they can then be deleted.
It can get ugly, but awk and printf combinations can help
Related
I'm running rke2 version v1.22.7+rke2r2 in 3 nodes. Today I decide to reinstall my application and I'm not able to do it anymore due to a problem in claiming PV.
I have had never this problems before, and I think is due to an update on local-path-provisioner but I'm not sure I'm still a newbie about kube.
Anyway these are the commands I run before installing my solution:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
I omitted metallb. Then as a test I try to install the test specified in the local-path-provisioner website (https://github.com/rancher/local-path-provisioner):
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pvc/pvc.yaml
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pod/pod.yaml
What I see is that the pvc stays in a PENDING status, then I check the pod creation in local-path-storage namespace and I see that the helper-pod-create-pvc-xxxx goes in error.
I try to get some logs and the only thing I was able to grab is this:
kubectl -n local-path-storage logs helper-pod-create-pvc-dd8cecf3-d65b-48f7-9e04-d56a20573f8e -f
/script/setup: line 3: VOL_DIR: parameter not set
So it seems VOL_DIR is not set for whatever reason. But I never did a custom configuration, it always starts without problem, and to be honest I don't know what put in VOL_DIR env variable and where.
I just answer to my question. It seems to be a bug on local-path-provisioner
they are fixing it.
In the meantime, instead of using the last one present in the master that has the bug, please use 0.0.21, like this:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.21/deploy/local-path-storage.yaml
I tested and it works fine.
The deploy manifest in master branch is already fixed.
The master branch is for development, so please use the v0.0.x (e.g v0.0.21, stable release) for production use.
I am looking for a way to get all object's metadata within a k8s cluster and send it out to an external server.
By metadata, I refer to objects Name, Kind, Labels, Annotations, etc.
The intention is to build an offline inventory of a cluster.
What would be the best approach to build it? Is there any tool that already does something similar?
Thanks
Posting this as a community wiki, feel free to edit and expand.
There are different ways to achieve it.
From this GitHub issue comment it's possible to iterate through all resources to get all available objects.
in yaml:
kubectl api-resources --verbs=list -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -o yaml
in json:
kubectl api-resources --verbs=list -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -o json
And then parse the output.
Use kubernetes clients.
There are already developed kubernetes clients (available for different languages) which can be used to get required information and work with it later.
Use kubectl plugin - ketall (didn't test it)
There's a developed plugin for kubectl which returns all cluster resources. Please find github repo - ketall. Again after cluster objects are gotten, you will need to parse/work with them.
Try this commands
kubectl get all --all-namespaces -o yaml
or
kubectl get all --all-namespaces -o json
you can parse and use as you find fit
I have K8s deployed on an EC2 based cluster,
There is an application running in the deployment, and I am trying to figure out the manifest files that were used to create the resources,
There were deployment, service and ingress files used to create the App setup.
I tried the following command, but I'm not sure if it's the correct one as it's also returning a lot of unusual data like lastTransitionTime, lastUpdateTime and status-
kubectl get deployment -o yaml
What is the correct command to view the manifest yaml files of an existing deployed resource?
There is no specific way to do that. You should store your source files in source control like any other code. Think of it like decompiling, you can do it, but what you get back is not the same as what you put in. That said, check for the last-applied annotation, if you use kubectl apply that would have a JSON version of a more original-ish manifest, but again probably with some defaulted fields.
You can try using the --export flag, but it is deprecated and may not work perfectly.
kubectl get deployment -o yaml --export
Refer: https://github.com/kubernetes/kubernetes/pull/73787
KUBE_EDITOR="cat" kubectl edit secrets rook-ceph-mon -o yaml -n rook-ceph 2>/dev/null >user.yaml
I'd like to diff a Kubernetes YAML template against the actual deployed ressources. This should be possible using kubectl diff. However, on my Kubernetes cluster in Azure, I get the following error:
Error from server (InternalError): Internal error occurred: admission webhook "aks-webhook-admission-controller.azmk8s.io" does not support dry run
Is there something I can enable on AKS to let this work or is there some other way of achieving the diff?
As a workaround you can use standard GNU/Linux diff command in the following way:
diff -uN <(kubectl get pods nginx-pod -o yaml) example_pod.yaml
I know this is not a solution but just workaround but I think it still can be considered as full-fledged replacement tool.
Thanks, but that doesn't work for me, because it's not just one pod
I'm interested in, it's a whole Helm release with deployment,
services, jobs, etc. – dploeger
But anyway you won't compare everything at once, will you ?
You can use it for any resource you like, not only for Pods. Just substitute Pod by any other resource you like.
Anyway, under the hood kubectl diff uses diff command
In kubectl diff --help you can read:
KUBECTL_EXTERNAL_DIFF environment variable can be used to select your
own diff command. By default, the "diff" command available in your
path will be run with "-u" (unified diff) and "-N" (treat absent files
as empty) options.
The real problem in your case is that you cannot use for some reason --dry-run on your AKS Cluster, which is question to AKS users/experts. Maybe it can be enabled somehow but unfortunately I have no idea how.
Basically kubectl diff compares already deployed resource, which we can get by:
kubectl get resource-type resource-name -o yaml
with the result of:
kubectl apply -f nginx.yaml --dry-run --output yaml
and not with actual content of your yaml file (simple cat nginx.yaml would be ok for that purpose).
You can additionally use:
kubectl get all -l "app.kubernetes.io/instance=<helm_release_name>" -o yaml
to get yamls of all resources belonging to specific helm release.
As you can read in man diff it has following options:
--from-file=FILE1
compare FILE1 to all operands; FILE1 can be a directory
--to-file=FILE2
compare all operands to FILE2; FILE2 can be a directory
so we are not limited to comparing single files but also files located in specific directory. Only we can't use these two options together.
So the full diff command for comparing all resources belonging to specific helm release currently deployed on our kubernetes cluster with yaml files from a specific directory may look like this:
diff -uN <(kubectl get all -l "app.kubernetes.io/instance=<helm_release_name>" -o yaml) --to-file=directory_containing_yamls/
When a Kubernetes pod goes into CrashLoopBackOff state, you will fix the underlying issue. How do you force it to be rescheduled?
For apply new configuration the new pod should be created (the old one will be removed).
If your pod was created automatically by Deployment or DaemonSet resource, this action will run automaticaly each time after you update resource's yaml.
It is not going to happen if your resource have spec.updateStrategy.type=OnDelete.
If problem was connected with error inside docker image, that you solved, you should update pods manually, you can use rolling-update feature for this purpose, In case when new image have same tag, you can just remove broken pod. (see below)
In case of node failure, the pod will recreated on new node after few time, the old pod will be removed after full recovery of broken node. worth noting it is not going to happen if your pod was created by DaemonSet or StatefulSet.
Any way you can manual remove crashed pod:
kubectl delete pod <pod_name>
Or all pods with CrashLoopBackOff state:
kubectl delete pod `kubectl get pods | awk '$3 == "CrashLoopBackOff" {print $1}'`
If you have completely dead node you can add --grace-period=0 --force options for remove just information about this pod from kubernetes.
Generally a fix requires you to change something about the configuration of the pod (the docker image, an environment variable, a command line flag, etc), in which case you should remove the old pod and start a new pod. If your pod is running under a replication controller (which it should be), then you can do a rolling update to the new version.
5 Years later, unfortunately, this scenario seems to still be the case.
#kvaps answer above suggested an alternative (rolling updates), that essentially updates(overwrites) instead of deleting a pod -- the current working link of rolling updates
The alternative to being able to delete a pod, was NOT to create a pod but instead create a deployment, and delete the deployment that contains the pod, subject to deletion.
$ kubectl get deployments -A
$ kubectl delete -n <NAMESPACE> deployment <DEPLOYMENT>
# When on minikube or using docker for development + testing
$ docker system prune -a
The first command displays all deployments, alongside their respective namespaces. This helped me reduce the error of deleting deployments that share the same name(name collision) but from two different namespaces.
The second command deletes a deployment that is exactly located underneath a namespace.
The last command helps when working in development mode. Essentially, removing all unused images, which is not required but helps clean up and save some disk-space.
Another great tip, is to try to understand the reasons why a Pod is failing. The problem may be relying completely somewhere else, and k8s does a good deal of documenting. For that one of the following may help:
$ kubectl logs -f <POD NAME>
$ kubectl get events
Other reference here on StackOveflow:
https://stackoverflow.com/a/55647634/132610
For anyone interested I wrote a simple helm chart and python script which watches the current namespace and deletes any pod that enters CrashLoopBackOff.
The chart is at https://github.com/timothyclarke/helm-charts/tree/master/charts/dr-abc.
This is a sticking plaster. Fixing the problem is always the best option. In my specific case getting the historic apps into K8s so the development teams have a common place to work and strangle the old applications with new ones is preferable to fixing all the bugs in the old apps. Having this in the namespace to keep the illusion of everything running buys that time.
This command will delete all pods that are in any of (CrashLoopBackOff, Init:CrashLoopBackOff, etc.) states. You can use grep -i <keyword> to match different states and then delete the pods that match the state. In your case it should be:
kubectl get pod -n <namespace> --no-headers | grep -i crash | awk '{print $1}' | while read line; do; kubectl delete pod -n <namespace> $line; done