kubectl wait sometimes timed out unexpectedly - kubernetes

I just add kubectl wait --for=condition=ready pod -l app=appname --timeout=30s in the last step of BitBucket Pipeline to report any deployment failure if the new pod somehow producing error.
I realize that the wait doesn't really consistent. Sometimes it gets timed out even if new pod from new image doesn't producing any error, pod turn to ready state.
Try to always change deployment.yaml or push newer image everytime to test this, the result is inconsistent.
BTW, I believe using kubectl rollout status doesn't suitable, I think because it just return after the deployment done without waiting for pod ready.
Note that there is not much difference if I change timeout from 30s to 5m since apply or rollout restart is quite instant.
kubectl version: 1.17
AWS EKS: latest 1.16

I'm placing this answer for better visibility as noted in the comments this indeed solves some problems with kubectl wait behavior.
I managed to replicate the issue and have some timeouts when my client version was older than server version. You have to match your client version with server in order to kubectl wait work properly.

Related

Helm incorrectly shows upgrade failed status

When using helm install/upgrade in some percentage of the time I get this failure:
Failed to install app MyApp. Error: UPGRADE FAILED: timed out waiting for the condition
This is because the app sometimes needs a bit more time to be up and running.
When I get this message helm doesn't stop the install/upgrade, but still works on it, which will be succeed by the end. And my whole cluster will be fully functional.
However helm still shows this failed status for the release. On one hand it is pretty annoying, on the other hand it can mess up a correctly installed release.
How to remove this false error and get into a 'deployed' state(without a new install/upgrade)?
What you might find useful here are the two following options:
--wait: Waits until all Pods are in a ready state, PVCs are bound, Deployments have minimum (Desired minus maxUnavailable) Pods in
ready state and Services have an IP address (and Ingress if a
LoadBalancer) before marking the release as successful. It will wait
for as long as the --timeout value. If timeout is reached, the
release will be marked as FAILED. Note: In scenarios where
Deployment has replicas set to 1 and maxUnavailable is not set to 0
as part of rolling update strategy, --wait will return as ready as
it has satisfied the minimum Pod in ready condition.
--timeout: A value in seconds to wait for Kubernetes commands to complete This defaults to 5m0s
Helm install and upgrade commands include two CLI options to assist in checking the deployments: --wait and --timeout. When using --wait, Helm will wait until a minimum expected number of Pods in the deployment are launched before marking the release as successful. Helm will wait as long as what is set with --timeout.
Also, please note that this is not a full list of cli flags. To see a description of all flags, just run helm <command> --help.
If you want to check why your chart might have failed you can use the helm history command.

Is it possible to make restartPolicy never during a kubernetes deployment, but only during the deployment?

Whenever I do a Kubernetes deployment with some sort of configuration error, the pod ends up in CrashLoopBackOff, constantly restarting the (totally broken) pod. What I would like is for any sort of errors during a deployment to immediately fail the deployment, rather than just blindly retrying until the deployment times out.
Deploy with restartPolicy: never and then use kubectl patch to modify the restart policy of that deployment.
To avoid continuous restart attempt of failing pod there is one open issue.
Also there is one open pull request to add this feature which is about to get merged, where you will have ability to specify max retries for restart policy OnFailure.
Till this feature get merged and released, kubectl patch seems to be the only way.
You can first deploy your cluster with restartPolicy: never, then use kubectl patch to modify the restart policy of the running deployment.

Is there any kubectl command to poll until all the pod roll to new code?

I am building deploy pipeline. I Need a "kubectl" command that would tell me that rollout is completed to all the pods then I can deploy to next stage.
The Deployment documentation suggests kubectl rollout status, which among other things will return a non-zero exit code if the deployment isn't complete. kubectl get deployment will print out similar information (how many replicas are expected, available, and up-to-date), and you can add a -w option to watch it.
For this purpose you can also consider using one of the Kubernetes APIs. You can "get" or "watch" the deployment object, and get back something matching the structure of a Deployment object. Using that you can again monitor the replica count, or the embedded condition list, and decide if it's ready or not. If you're using the "watch" API you'll continue to get updates as the object status changes.
The one trick here is detecting failed deployments. Say you're deploying a pod that depends on a database; usual practice is to configure the pod with the hostname you expect the database to have, and just crash (and get restarted) if it's not there yet. You can briefly wind up in CrashLoopBackOff state when this happens. If your application or deployment is totally wrong, of course, you'll also wind up in CrashLoopBackOff state, and your deployment will stop progressing. There's not an easy way to tell these two cases apart; consider an absolute timeout.

Why does scaling down a deployment seem to always remove the newest pods?

(Before I start, I'm using minikube v27 on Windows 10.)
I have created a deployment with the nginx 'hello world' container with a desired count of 2:
I actually went into the '2 hours' old pod and edited the index.html file from the welcome message to "broken" - I want to play with k8s to seem what it would look like if one pod was 'faulty'.
If I scale this deployment up to more instances and then scale down again, I almost expected k8s to remove the oldest pods, but it consistently removes the newest:
How do I make it remove the oldest pods first?
(Ideally, I'd like to be able to just say "redeploy everything as the exact same version/image/desired count in a rolling deployment" if that is possible)
Pod deletion preference is based on a ordered series of checks, defined in code here:
https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/controller/controller_utils.go#L737
Summarizing- precedence is given to delete pods:
that are unassigned to a node, vs assigned to a node
that are in pending or not running state, vs running
that are in not-ready, vs ready
that have been in ready state for fewer seconds
that have higher restart counts
that have newer vs older creation times
These checks are not directly configurable.
Given the rules, if you can make an old pod to be not ready, or cause an old pod to restart, it will be removed at scale down time before a newer pod that is ready and has not restarted.
There is discussion around use cases for the ability to control deletion priority, which mostly involve workloads that are a mix of job and service, here:
https://github.com/kubernetes/kubernetes/issues/45509
what about this :
kubectl scale deployment ingress-nginx-controller --replicas=2
Wait until 2 replicas are up.
kubectl delete pod ingress-nginx-controller-oldest-replica
kubectl scale deployment ingress-nginx-controller --replicas=1
I experienced zero downtime doing so while removing oldest pod.

kubernetes minikube faster uptime

I am using minikube and building my projects by tearing down the previous project and rebuilding it with
kubectl delete -f myprojectfiles
kubectl apply -f myprojectfiles
The files are a deployment and a service.
When I access my website I get a 503 error as I'm waiting for kubernetes to bring up the deployment. Is there anyway to speed this up? I see that my application is already built because the logs show it is ready. However it stays showing 503 for what feels like a few minute before everything in kubernetes triggers and starts serving me the application.
What are some things I can do to speed up the uptime?
Configure what is called readinessProbe, it won't fasten your boot up time, but it will help you by not giving false sense that application is up and running. With this your traffic will only be sent to your application pod when it is ready to accept the connection. Please read about it here.
FWIW your application might be waiting on some dependency to be up and running, also add these kinda health checks to that dependency pod.
You should not delete your Kubernetes resources. Use either kubectl apply or kubectl replace to update your project.
If you delete it, the nginx ingress controller won't find any upstream for a short period of time and puts on a blacklist for some seconds.
Also you should make sure, that you use Deployment which is able to do a rolling update without any downtime.