helm test failure: timed out waiting for the condition - kubernetes

We have a simple release test for a Redis chart. After running helm test myReleaseName --tls --cleanup, we got
RUNNING: myReleaseName-redis
ERROR: timed out waiting for the condition
There are several issues in Github repository at https://github.com/helm/helm/search?q=timed+out+waiting+for+the+condition&type=Issues but I did not find a solution to it.
What's going on here?

This first looks puzzling and shows little information because --cleanup will kill the pods after running. One can remove it to get more information. I, thus, reran the test with
helm test myReleaseName --tls --debug
Then use kubectl get pods to examine the pod used for testing. (It could be of other names.)
NAME READY STATUS RESTARTS AG
myReleaseName-redis 0/1 ImagePullBackOff 0 12h
From here, it is more clear now that there is something wrong with images, and it turned out that the link used to pull the image is not correct. (Use kubectl describe pod <pod-name> and then you can find the link you used to pull the image.)
Fix the link, and it worked.

For me, helm couldn't pull the image as it was in private repo
kubectl get events helped me get the logs.
9m38s Warning Failed pod/airflow-scheduler-bbd8696bf-5mfg7 Failed to pull image
After authenticating, helm install command worked.
REF: https://github.com/helm/charts/issues/11904

if helm test <ReleaseName> --debug shows installation completed successfully but deployment failed, may be because of deployment takes more than 300 sec.
Helm will wait as long as what is set with --timeout . By default, the timeout is set to 5min, sometimes for many reason helm install may take extra time to deploy, so increase the timeout value and validate the installation.
helm install <ReleaseName> --debug --wait --timeout 30m

Related

UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Yesterday I stopped a helm upgrade when it was running on a release pipeline in Azure DevOps and the followings deployments got it failure.
I tried to see the chart that has failed with the aim of delete it but the chart of the microservice ("auth") doesn't appear. I used the command «helm list -n [namespace_of_AKS]» and it doesn't appear.
What can i do to solve this problem?
Error in Azure Release Pipeline
2022-03-24T08:01:39.2649230Z Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
2022-03-24T08:01:39.2701686Z ##[error]Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
Helm List
This error can happen for few reasons, but it most commonly occurs when there is an interruption during the upgrade/install process as you already mentioned.
To fix this one may need to, first rollback to another version, then reinstall or helm upgrade again.
Try below command to list
helm ls --namespace <namespace>
but you may note that when running that command ,it may not show any columns with information
Try to check the history of the previous deployment
helm history <release> --namespace <namespace>
This provides with information mostly like the original installation was never completed successfully and is pending state something like STATUS: pending-upgrade state.
To escape from this state, use the rollback command:
helm rollback <release> <revision> --namespace <namespace>
revision is optional, but you should try to provide it.
You may then try to issue your original command again to upgrade or reinstall.
helm ls -a -n {namespace} will list all releases within a namespace, regardless of status.
You can also use helm ls -aA instead to list all releases in all namespaces -- in case you actually deployed the release to a different namespace (I've done that before)
Try deleting the latest helm secret for the deployment and re-run your helm apply command.
kubectl get secret -A | grep <app-name>
kubectl delete secret <secret> -n <namespace>

How to get kubectl apply full progress until container running or error?

I am currently building a CI/CD pipeline where I am trying to test a simple nginx deployment, the problem is when I run kubectl apply -f ./nginx-demplyment.yaml I only get the output that resources got created/updated.
In my use case I the first thing i get is:
deployment.apps/nginx1.14.2 created
service/my-service created
And this is the output of kubectl get all where pods STATUS says ContainerCreating:
The problem is in my pipeline I want to run curl command to check if my nginx server is working properly once the image gets pulled and the pod STATUS is Running, but obviously if the image didn't get pulled yet curl will say connection refused because the container is not up yet.
How can I do that and is there a way to get the output of pulling images at least?
the task runs the commands with && so the curl gets executed right after kubectl.
Am working on a kind cluster with 3 control-plane nodes and 2 worker nodes.
You can use kubectl wait to wait for the deployment to be in a certain condition. Another option (or possibly used in combination) is to retry curl until the request returns a 200. An example of kubectl wait for your nginx deployment to become ready:
kubectl wait --for=condition=available deployment/nginx
I think there is a possible way to do that which is by downloading the images first then deploying Nginx.
For minikube users you can run:
minikube image load <image-name>
For Kind users which is my case:
kind load docker-image nginx:1.14.2 --name k8s-cluster
once the image gets pulled next command curl will work as wanted.

Helm incorrectly shows upgrade failed status

When using helm install/upgrade in some percentage of the time I get this failure:
Failed to install app MyApp. Error: UPGRADE FAILED: timed out waiting for the condition
This is because the app sometimes needs a bit more time to be up and running.
When I get this message helm doesn't stop the install/upgrade, but still works on it, which will be succeed by the end. And my whole cluster will be fully functional.
However helm still shows this failed status for the release. On one hand it is pretty annoying, on the other hand it can mess up a correctly installed release.
How to remove this false error and get into a 'deployed' state(without a new install/upgrade)?
What you might find useful here are the two following options:
--wait: Waits until all Pods are in a ready state, PVCs are bound, Deployments have minimum (Desired minus maxUnavailable) Pods in
ready state and Services have an IP address (and Ingress if a
LoadBalancer) before marking the release as successful. It will wait
for as long as the --timeout value. If timeout is reached, the
release will be marked as FAILED. Note: In scenarios where
Deployment has replicas set to 1 and maxUnavailable is not set to 0
as part of rolling update strategy, --wait will return as ready as
it has satisfied the minimum Pod in ready condition.
--timeout: A value in seconds to wait for Kubernetes commands to complete This defaults to 5m0s
Helm install and upgrade commands include two CLI options to assist in checking the deployments: --wait and --timeout. When using --wait, Helm will wait until a minimum expected number of Pods in the deployment are launched before marking the release as successful. Helm will wait as long as what is set with --timeout.
Also, please note that this is not a full list of cli flags. To see a description of all flags, just run helm <command> --help.
If you want to check why your chart might have failed you can use the helm history command.

kubectl wait sometimes timed out unexpectedly

I just add kubectl wait --for=condition=ready pod -l app=appname --timeout=30s in the last step of BitBucket Pipeline to report any deployment failure if the new pod somehow producing error.
I realize that the wait doesn't really consistent. Sometimes it gets timed out even if new pod from new image doesn't producing any error, pod turn to ready state.
Try to always change deployment.yaml or push newer image everytime to test this, the result is inconsistent.
BTW, I believe using kubectl rollout status doesn't suitable, I think because it just return after the deployment done without waiting for pod ready.
Note that there is not much difference if I change timeout from 30s to 5m since apply or rollout restart is quite instant.
kubectl version: 1.17
AWS EKS: latest 1.16
I'm placing this answer for better visibility as noted in the comments this indeed solves some problems with kubectl wait behavior.
I managed to replicate the issue and have some timeouts when my client version was older than server version. You have to match your client version with server in order to kubectl wait work properly.

kubernetes helm: "lost connection to pod" and "transport is closing" errors

I run helm upgrade --install to modify the state of my kubernetes cluster and I sometimes get an error like this:
22:24:34 StdErr: E0126 17:24:28.472048 48084 portforward.go:178] lost connection to pod
22:24:34 Error: UPGRADE FAILED: transport is closing
It seems that I am not the only one, and it seems to happen with many different helm commands. All of these github issues have descriptions or comments mentioning "lost connection to pod" or "transport is closing" errors (usually both):
https://github.com/kubernetes/helm/issues/1183
https://github.com/kubernetes/helm/issues/2003
https://github.com/kubernetes/helm/issues/2025
https://github.com/kubernetes/helm/issues/2288
https://github.com/kubernetes/helm/issues/2560
https://github.com/kubernetes/helm/issues/3015
https://github.com/kubernetes/helm/issues/3409
While it can be educational to read through hundreds of github issue comments, usually it's faster to cut to the chase on stackoverflow, and it didn't seem like this question existed yet, so here it is. Hopefully some quick symptom fixes and eventually one or more root cause diagnoses end up in the answers.
Memory limits were causing this error for me. The following fixed it:
kubectl set resources deployment tiller-deploy --limits=memory=200Mi
I was able to correct this by adding the tiller host information to the helm install command.
--host=10.111.221.14:443
You can get your tiller IP this way
$ kubectl get svc -n kube-system tiller-deploy
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tiller-deploy ClusterIP 10.111.221.14 <none> 44134/TCP 34h
Full command example
helm install stable/grafana --name=grafana --host=10.111.221.14:4413
I know this is a bit of a work around but all other functions of helm are performing properly after installing via this method. I did not have to add the host information again after the initial install for performing upgrades or rollbacks. Hope this helps!
Deleting the tiller deployment and recreating it is only fix I've seen on github (here and here). This has been most helpful to people when the same helm command fails repeatedly (not with intermittent failures, though you could try it).
delete tiller (helm's server-side component):
kubectl delete deployment -n kube-system tiller-deploy
# deployment "tiller-deploy" deleted
and recreate it:
helm init --upgrade
# $HELM_HOME has been configured at /root/.helm.
# Tiller (the helm server side component) has been upgraded to the current version.
# Happy Helming!
Bouncing tiller obviously won't fix the root cause. There is hopefully a better answer than this forthcoming, maybe from https://github.com/kubernetes/helm/issues/2025. This is the only open github issue as of 13 Feb 2018.