Helm stuck on "another operation is in progress" but no other releases or installs exist - kubernetes

I know title sounds like "did you even try to google", but helm gives me the typical: another operation (install/upgrade/rollback) is in progress
What I can't figure out, is there's no actual releases anywhere that are actually in progress.
I've run helm list --all --all-namespaces and the list is just blank. Same with running helm history against any namespace I can think of. Nothing, all just blank. I've even deleted the namespace and everything in it that the app was initially installed in, and it still is broken.
I've also found answers to delete secrets, which I have, and it doesn't help.
Is there some way to hard reset helm's state? Because all the answers I find on this topic involve rolling back, uninstalling, or deleting stuck releases, and none exist on this entire cluster.
Helm is v3.8.1 if that helps. Thanks for any help on this, it's driving me crazy.

I ended up figuring this out. The pipeline running helm executed from a gitlab runner that runs on one cluster, but uses a kubernetes context to target my desired cluster. At some point, the kubernetes context wasn't loaded correctly, and a bad install went to the host cluster the runner lived on.
While I was still targeting a different cluster, the helm command saw a bad install on the cluster local to it, so a helm list --all didn't see anything on the target clusetr.

Related

Helm deploy does not update/remove code that was changed prior on Cronjobs or Deployments

I am new to the K8s realm but enjoy this service. I do have two issues which I was expecting Helm to be able to update.
For instance, my question revolves around when I do a helm upgrade. Let's say I have code where I have a volumeMount of a directory in Deployments or Cronjobs. Whenever I do the update in my YAML file to remove the mount, it still exists after Helm's completion.
So what I do is edit the YAML file of let's say deployment.yaml or cronjobs.yaml in the CLI k8s (I use k9s) for deployment.yaml. However, cronjobs.yaml is the hard one since I have to delete it and run helm update again. (Please let me know if there is a way of updating Cronjobs a better way because if I update Deployments the pods are updated but with Cronjobs they don't.)
Would appreciate any help with my questions on how best to work with k8s.

Helm delete and reinstall deployment. Wait or not to wait?

I have situation where I am deploying some chart, let's call it "myChart".
Let's suppose I have a pipeline, where I am doing below:
helm delete myChart_1.2 -n <myNamespace>
An right after I am installing new one:
helm delete myChart_1.3 -n <myNamespace>
Does Kubernetes or maybe Helm knows that all the resources should be deleted first and then install new?
For instance there might be some PVC and PV that are still not deleted. Is there any problem with that, should I add some waits before deployment?
Helm delete (aka. uninstall) should remove the objects managed in a given deployment, before exiting.
Still, when the command returns: you could be left with resources in a Terminating state, pending actual deletion.
Usually, we could find about PVC, that may still be attached to a running container.
Or objects such as ReplicaSet or Pods -- most likely, your Helm chart installs Deployments, DaemonSets, StatefulSets, ... top-level objects may appear to be deleted, while their child objects are still being terminated.
Although this shouldn't be an issue for Helm, assuming that your application is installed using a generated name, and as long as your chart is able to create multiple instances of a same application, in a same cluster/namespace, ... without them overlapping ( => if all resources managed through Helm have unique names, which is not always the case ).
If your chart is hosted on a public repository, let us know what to check. And if you're not one of the maintainer for that chart: beware that Helm charts could go from amazing to very bad, depending on who's contributing, what use cases have been met so far, ...
Kubernetes (and Helm by extension) will never clean up PVCs that have been created as part of StatefulSets. This is intentional (see relevant documentation) to avoid accidental loss of data.
Therefore, if you do have PVCs created from StatefulSets in your chart and if your pipeline re-installs your Helm chart under the same name, ensure that PVCs are deleted explicitly after running "helm delete", e.g. with a separate "kubectl delete" command.

check history of OpenShift / Kubernetes deployments

We have constantly issues with our OpenShift Deployments. Credentials are missing suddenly (or suddenly we have the wrong credentials configured), deployments are scaled up and down suddenly etc.
Nobody of the team is aware of anything he did. However I am quite sure that this happens unknowingly from my recent experiences.
Is there any way to check the history of modifications to a resource? E.g. the last "oc/kubectl apply -f" - optimally with the contents that were modified and the user?
For a one off issue, you can also look at the replicaSets present in that namespace and examine them for differences. Depending on how much history you keep it may have already been lost, if it was present to begin with.
Try:
kubectl get rs -n my-namespace
Or, dealing with DeploymentConfigs, replicaControllers:
oc get rc -n my-namespace
For credentials, assuming those are in a secret and not the deployment itself, you wouldn't have that history without going to audit logs.
You need to configure and enable audit log, checkout the oc manual here.
In addition to logging metadata for all requests, logs request bodies
for every read and write request to the API servers...
K8s offers only scant functionality regarding tracking changes. Most prominently, I would look at kubectl rollout history for Deployments, Daemonsets and StatefulSets. Still, this will only tell you when and what was changes, but not who did it.
Openshift does not seem to offer much on top, since audit logging is cumbersome to configure and analyze.
With a problem like yours, the best remedy I see would be to revoke direct production access to K8s by the team and mandate changes to be rolled out via pipeline. That way you can use Git to track who did what.

AKS not deleting orphaned resources

After some time, I have problems with some of our clusters where auto-delete of orphaned resources stop working. So if I remove a deployment nor the replicaset or the pods are removed, or if I remove a replicaset, a new one is created but the previous pods are still there.
I can't even update some deployments because that will create a new replicaset+pods.
This is an actual problem as we are creating and removing some resources and relying on auto-child removal.
The thing is that, destroying and creating again a cluster makes it working perfectly and we weren't able to trace to something we did that caused the problem.
I tried to upgrade both master and agent nodes to a newer version and restarting kubelet in agent nodes but that doesn't solve anything.
Could anyone knows where could be the problem or which component is in charge of the cascade deletion of orphan resources?
Does this happen to someone else? It happend to us already in 3 different clusters with different Kubernetes version.
I have tested it creating the test deployment in K8s documentation, and then delete it:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
kubectl delete deployments.apps nginx-deployment
But the pods are still there.
Thanks in advance
The problem was caused by a faulty CRD / Admission Webhook. It could seem strange, but a wrong CRD or a faulty pod acting as webhook will make kube-controller-manager fail for all resources (at least in AKS). After removing the CRD's and the faulty webhook it started to work again. (The reason why the webhook was failing is another different thing)

Is there a way to make kubectl apply restart deployments whose image tag has not changed?

I've got a local deployment system that is mirroring our production system. Both are deployed by calling kubectl apply -f deployments-and-services.yaml
I'm tagging all builds with the current git hash, which means that for clean deploys to GKE, all the services have a new docker image tag which means that apply will restart them, but locally to minikube the tag is often not changing which means that new code is not run. Before I was working around this by calling kubectl delete and then kubectl create for deploying to minikube, but as the number of services I'm deploying has increased, that is starting to stretch the dev cycle too far.
Ideally, I'd like a better way to tell kubectl apply to restart a deployment rather than just depending on the tag?
I'm curious how people have been approaching this problem.
Additionally, I'm building everything with bazel which means that I have to be pretty explicit about setting up my build commands. I'm thinking maybe I should switch to just delete/creating the one service I'm working on and leave the others running.
But in that case, maybe I should just look at telepresence and run the service I'm dev'ing on outside of minikube all together? What are best practices here?
I'm not entirely sure I understood your question but that may very well be my reading comprehension :)
In any case here's a few thoughts that popped up while reading this (again not sure what you're trying to accomplish)
Option 1: maybe what you're looking for is to scale down and back up, i.e. scale your deployment to say 0 and then back up, given you're using configmap and maybe you only want to update that, the command would be kubectl scale --replicas=0 -f foo.yaml and then back to whatever
Option 2: if you want to apply the deployment and not kill any pods for example, you would use the cascade=false (google it)
Option 3: lookup the rollout option to manage deployments, not sure if it works on services though
Finally, and that's only me talking, share some more details like which version of k8s are you using? maybe provide an actual use case example to better describe the issue.
Kubernetes, only triggers a deployment when something has changed, if you have image pull policy to always you can delete your pods to get the new image, if you want kube to handle the deployment you can update the kubernetes yaml file to container a constantly changing metadata field (I use seconds since epoch) which will trigger a change. Ideally you should be tagging your images with unique tags from your CI/CD pipeline with the commit reference they have been built from. this gets around this issue and allows you to take full advantage of the kubernetes rollback feature.