Dynamic Deployment of pods as per some conditions || Deploying pod only when the process inside other pod has finished [duplicate] - kubernetes

This question already has answers here:
What is the equivalent for depends_on in kubernetes
(6 answers)
How can we create service dependencies using kubernetes
(3 answers)
Helm install, Kubernetes - how to wait for the pods to be ready?
(3 answers)
Closed 1 year ago.
Need some help on implementing better kubernetes resource deployment.
Essentially, we are trying to mention every resource in a single values.yaml file.
When you install the chart. All resources are created parallelly. Among these I've 2 components.
Let's say component1 and component2.
For component1, It's main function will be to install some dars into the server machine. This will take between 45 min to an hour.
For component2, It is dependent on some dars that will be installed onto server by commponent1.
Problem is, When you deploy helm chart and every pod is created at the same time.
Even though status for a pod for component2 will be running. When you inspect the container logs it will tell you process start up failed. Due to some missing classes(Which would've been installed by component1)
I am looking for a way, by which I can either introduce some delay until component1 is done or Keep destroying and recreating resources for component2 until component1 is done.
Delay based on if all dars are installed into server machine.
For restarting all resources for component2 I was thinking about creating a 3rd pod or a maintenance pod. Which will keep looking up both components1 and 2 , and it will keep restarting resource creation for component2 until component1 is done.
Readiness and liveliness probes will not work here because even though service startup has failed. Pod status will be running.
Any tips or suggestions on how to implement this will greatly help. Or if there's a better way to handle this.

You can try adding the flag --wait or --wait-for-job according to your usecase as Helm will wait until a minimum expected number of Pods in the deployment are launched before marking the release as successful. Helm will wait as long as what is set with --timeout. Please refer the link --wait flag detailed description and https://helm.sh/docs/helm/helm_upgrade/#options
helm upgrade --install --wait --timeout 20 demo demo

Related

Helm incorrectly shows upgrade failed status

When using helm install/upgrade in some percentage of the time I get this failure:
Failed to install app MyApp. Error: UPGRADE FAILED: timed out waiting for the condition
This is because the app sometimes needs a bit more time to be up and running.
When I get this message helm doesn't stop the install/upgrade, but still works on it, which will be succeed by the end. And my whole cluster will be fully functional.
However helm still shows this failed status for the release. On one hand it is pretty annoying, on the other hand it can mess up a correctly installed release.
How to remove this false error and get into a 'deployed' state(without a new install/upgrade)?
What you might find useful here are the two following options:
--wait: Waits until all Pods are in a ready state, PVCs are bound, Deployments have minimum (Desired minus maxUnavailable) Pods in
ready state and Services have an IP address (and Ingress if a
LoadBalancer) before marking the release as successful. It will wait
for as long as the --timeout value. If timeout is reached, the
release will be marked as FAILED. Note: In scenarios where
Deployment has replicas set to 1 and maxUnavailable is not set to 0
as part of rolling update strategy, --wait will return as ready as
it has satisfied the minimum Pod in ready condition.
--timeout: A value in seconds to wait for Kubernetes commands to complete This defaults to 5m0s
Helm install and upgrade commands include two CLI options to assist in checking the deployments: --wait and --timeout. When using --wait, Helm will wait until a minimum expected number of Pods in the deployment are launched before marking the release as successful. Helm will wait as long as what is set with --timeout.
Also, please note that this is not a full list of cli flags. To see a description of all flags, just run helm <command> --help.
If you want to check why your chart might have failed you can use the helm history command.

How will a scheduled (rolling) restart of a service be affected by an ongoing upgrade (and vice versa)

Due to a memory leak in one of our services I am planning to add a k8s CronJob to schedule a periodic restart of the leaking service. Right now we do not have the resources to look into the mem leak properly, so we need a temporary solution to quickly minimize the issues caused by the leak. It will be a rolling restart, as outlined here:
How to schedule pods restart
I have already tested this in our test cluster, and it seems to work as expected. The service has 2 replicas in test, and 3 in production.
My plan is to schedule the CronJob to run every 2 hours.
I am now wondering: How will the new CronJob behave if it should happen to execute while a service upgrade is already running? We do rolling upgrades to achieve zero downtime, and we sometimes roll out upgrades several times a day. I don't want to limit the people who deploy upgrades by saying "please ensure you never deploy near to 08:00, 10:00, 12:00 etc". That will never work in the long term.
And vice versa, I am also wondering what will happen if an upgrade is started while the CronJob is already running and the pods are restarting.
Does kubernetes have something built-in to handle this kind of conflict?
This answer to the linked question recommends using kubectl rollout restart from a CronJob pod. That command internally works by adding an annotation to the deployment's pod spec; since the pod spec is different, it triggers a new rolling upgrade of the deployment.
Say you're running an ordinary redeployment; that will change the image: setting in the pod spec. At about the same time, the kubectl rollout restart happens that changes an annotation setting in the pod spec. The Kubernetes API forces these two changes to be serialized, so the final deployment object will always have both changes in it.
This question then reduces to "what happens if a deployment changes and needs to trigger a redeployment, while a redeployment is already running?" The Deployment documentation covers this case: it will start deploying new pods on the newest version of the pod spec and treat all older ones as "old", so a pod with the intermediate state might only exist for a couple of minutes before getting replaced.
In short: this should work consistently and you shouldn't need to take any special precautions.

Kubernetes rolling deploy: terminate a pod only when there are no containers running

I am trying to deploy updates to pods. However I want the current pods to terminate only when all the containers inside the pod have terminated and their process is complete.
The new pods can keep waiting to start untill all container in the old pods have completed. We have a mechanism to stop old pods from picking up new tasks and therefore they should eventually terminate.
It's okay if twice the pods exist at some instance of time. I tried finding solution for this in kubernetes docs but wan't successful. Pointers on how / if this is possible would be helpful.
well I guess then you may have to create a duplicate kind of deployment with new image as required and change the selector in service to new deployment, which will prevent external traffic from entering pre-existing pods and new calls can go to new pods. Then later you can check for something like -
Kubectl top pods -c containers
and if the load appears to be static and low, then preferrably you can delete the old pods related deployment later.
But for this thing everytime the service selectors have to be updated and likely for keeping track of things you can append the git commit hash to the service selector to keep it unique everytime.
But rollback to previous versions if required from inside Kubernetes cluster will be difficult, so preferably you can trigger the wanted build again.
I hope this makes some sense !!

kubernetes minikube faster uptime

I am using minikube and building my projects by tearing down the previous project and rebuilding it with
kubectl delete -f myprojectfiles
kubectl apply -f myprojectfiles
The files are a deployment and a service.
When I access my website I get a 503 error as I'm waiting for kubernetes to bring up the deployment. Is there anyway to speed this up? I see that my application is already built because the logs show it is ready. However it stays showing 503 for what feels like a few minute before everything in kubernetes triggers and starts serving me the application.
What are some things I can do to speed up the uptime?
Configure what is called readinessProbe, it won't fasten your boot up time, but it will help you by not giving false sense that application is up and running. With this your traffic will only be sent to your application pod when it is ready to accept the connection. Please read about it here.
FWIW your application might be waiting on some dependency to be up and running, also add these kinda health checks to that dependency pod.
You should not delete your Kubernetes resources. Use either kubectl apply or kubectl replace to update your project.
If you delete it, the nginx ingress controller won't find any upstream for a short period of time and puts on a blacklist for some seconds.
Also you should make sure, that you use Deployment which is able to do a rolling update without any downtime.

Updating deployment in GCE leads to node restart

We have some odd issue happening with GCE.
We have 2 clusters dev and prod each consisting of 2 nodes.
Production nodes are n1-standard-2, dev - n1-standard-1.
Typically dev cluster is busier with more pods eating more resources.
We deploy updates mostly with deployments (few projects still recreate RCs to update to latest versions)
Normally, the process is: build project, build docker image, docker push, create new deployment config and kubectl apply new config.
What's constantly happening on production is after applying new config, single or both nodes restart. Cluster does not seem to be starving with memory/cpu and we could not find anything in the logs that would explain those restarts.
Same procedure on staging never causes nodes to restart.
What can we do to diagnose the issue? Any specific events,logs we should be looking at?
Many thanks for any pointers.
UPDATE:
This is still happening and I found following in Computer Engine - Operations:
repair-1481931126173-543cefa5b6d48-9b052332-dfbf44a1
Operation type: compute.instances.repair.recreateInstance
Status message : Instance Group Manager 'projects/.../zones/europe-west1-c/instanceGroupManagers/gke-...' initiated recreateInstance on instance 'projects/.../zones/europe-west1-c/instances/...'. Reason: instance's intent is RUNNING but instance's health status is TIMEOUT.
We still can't figure out why this is happening and it's having a negative effect on our production environment every time we deploy our code.