How to reduce downtime caused by pulling images in the Kubernetes Recreate deployment strategy - kubernetes

Assuming I have a Kubernetes Deployment object with the Recreate strategy and I update the Deployment with a new container image version. Kubernetes will:
scale down/kill the existing Pods of the Deployment,
create the new Pods,
which will pull the new container images
so the new containers can finally run.
Of course, the Recreate strategy is exepected to cause a downtime between steps 1 and 4, where no Pod is actually running. However, step 3 can take a lot of time if the container images in question are or the container registry connection is slow, or both. In a test setup (Azure Kubernetes Services pulling a Windows container image from Docker Hub), I see it taking 5 minutes and more, which makes for a really long downtime.
So, what is a good option to reduce that downtime? Can I somehow get Kubernetes to pull the new images before killing the Pods in step 1 above? (Note that the solution should work with Windows containers, which are notoriously large, in case that is relevant.)
On the Internet, I have found this Codefresh article using a DaemonSet and Docker in Docker, but I guess Docker in Docker is no longer compatible with containerd.
I've also found this StackOverflow answer that suggests using an Azure Container Registry with Project Teleport, but that is in private preview and doesn't support Windows containers yet. Also, it's specific to Azure Kubernetes Services, and I'm looking for a more general solution.
Surely, this is a common problem that has a "standard" answer?
Update 2021-12-21: Because I've got a corresponding answer, I'll clarify that I cannot easily change the deployment strategy. The application in question does not support running Pods of different versions at the same time because it uses a database that needs to be migrated to the corresponding application version, without forwards or backwards compatibility.

Implement a "blue-green" deployment strategy. For instance, the service might be running and active in the "blue" state. A new deployment is created with a new container image, which deploys the "green" pods with the new container image. When all of the "green" pods are ready, the "switch live" step is run, which switches the active color. Very little downtime.
Obviously, this has tradeoffs. Your cluster will need more memory to run the additional transitional pods. The deployment process will be more complex.

Via https://www.reddit.com/r/kubernetes/comments/oeruh9/can_kubernetes_prepull_and_cache_images/, I've found these ideas:
Implement a DaemonSet that runs a "sleep" loop on all the images I need.
Use http://github.com/mattmoor/warm-image, which has no Windows support.
Use https://github.com/ContainerSolutions/ImageWolf, which says, "ImageWolf is currently alpha software and intended as a PoC - please don't run it in production!"
Use https://github.com/uber/kraken, which seems to be a registry, not a pre-pulling solution.
Use https://github.com/dragonflyoss/Dragonfly (now https://github.com/dragonflyoss/Dragonfly2), which also seems to do somethings completely different.
Use https://github.com/senthilrch/kube-fledged, which looks exactly right and more mature than the others, but has no Windows support.
Use https://github.com/dcherman/image-cache-daemon, which has no Windows support.
Use https://goharbor.io/blog/harbor-2.1/, which also seems to be a registry, not a pre-pulling solution.
Use https://openkruise.io/docs/user-manuals/imagepulljob/, which also looks right, but a) OpenKruise is huge and I'm not sure I want to install this just to preload images, and b) it seems it has no Windows support.
So, it seems I have to implement this on my own, with a DaemonSet. I still hope someone can provide a better answer than this one 🙂 .

Related

Is it possible to have hot reload with Kubernetes?

I am trying to get into the way of things with the Kubernetes but I'm facing a problem with hot reload.
In the development mode when I am just working on the code and I need the code be synchronized with the pods directly like in Docker when I use volumes to keep the state.
Is there any chance to make it work with the Kubernetes?
I would be thankful for any help with Kubernetes...
From the view of Cloud native(or kubernetes), the infrastructure is immutable and the Pods are the smallest deployable units. So you should replace the pod rather than change it(your code is part of the pod/image). so the correct process is: change code -> build image -> recreate pod in your env But actually, your process still could work just not follow the best practice of cloud native... –
vincent pli
Also, you can try Ksync, that allows you to synchronize application code between your local and Kubernetes cluster. Kindly ask you to refer to official documentation to read more about.

How to start pods on demand in Kubernetes?

I need to create pods on demand in order to run a program. it will run according to the needs, so it could be that for 5 hours there will be nothing running, and then 10 requests will be needed to process, and I might need to limit that only 5 will run simultaneously because of resources limitations.
I am not sure how to build such a thing in kubernetes.
Also worth noting is that I would like to create a new docker container for each run and exit the container when it ends.
There are many options and you’ll need to try them out. The core tool is HorizontalPodAutoscaler. Systems like KEDA build on top of that to manage metrics more easily. There’s also Serverless tools like knative or kubeless. Or workflow tools like Tekton, Dagster, or Argo.
It really depends on your specifics.

GKE automated pod recycling ideas

I'm thinking of a solution to do a rolling update on a schedule without really releasing something. I was thinking of an ENV variable change through kubectl patch to kick off the update in GKE.
The context is we have containers that don't do garbage collection, and the temporary fix and easiest path forward and is cycling out pods frequently that we can schedule on a nightly.
Our naive approach would be to schedule this through our build pipeline, but seems like there's a lot of moving parts.
I haven't looked at Cloud Functions, but I'm sure there's an API that could do this and I'm leaning towards automating this with Cloud Functions.
Or is there already a GKE solution to do this?
Any other approaches to this problem?
I know AWS's ec2 has this feature for ASG, I was looking for the same thing so I don't to do a hacky ENV var change on manifest.
I can think of two possibilities:
Cronjobs. You can use CronJobs to run tasks at a specific time or interval. Suggested for automatic tasks, such as backups, reporting, sending emails, or cleanup tasks. More details here.
CI/CD with CloudBuild. When you push a change to your repository, Cloud Build automatically builds and deploys the container to your GKE cluster.

GKE Node Upgrade "Out of Resources"

I had left my GKE cluster running 3 minor versions behind the latest and decided to finally upgrade. The master upgrade went well but then my node upgrades kept failing. I used the Cloud Shell console to manually start an upgrade and view the output, which said something along the lines of "Zone X is out of resources, try Y instead." Unfortunately,I can't just spin up a new node pool in a new zone and have my pipeline work because I am using GitLab's AutoDevOps pipeline and they make certain assumptions about node pool naming and such that I can't find any way to override. I also don't want to potentially lose the data stored in my persistent volumes if I end up needing to re-create everything in a new node-pool.
I just solved this issue but couldn't find any questions posed on this particular problem, so I wanted to post the answer here in case someone else comes looking for it.
My particular problem was that I had a non-autoscaling node pool with a single node. For my purposes, that's enough for the application stack to run smoothly and I don't want to incur unforeseen charges with additional nodes automatically being added to the pool. However, this meant that the upgrade had to apparently share resources with everything else running on that node to perform the upgrade, which it didn't have enough of. The solution was simple: add more nodes temporarily.
Because this is specifically GKE, I was able to use a beta feature called "surge upgrade", which allows you to set the maximum number of "surge" nodes to add when performing an upgrade. Once this was enabled, I started the upgrade process again and it temporarily added an extra node, performed the upgrade, and then scaled back down to a single node.
If you aren't on GKE, or don't wish to use a beta feature (or can't), then simply resize the node pool with the node(s) that needs upgrading. I would add a single node unless you are positive you need more.

Can a ReplicaSet be configured to allow in progress updates to complete?

I currently have a kubernetes setup where we are running a decoupled drupal/gatsby app. The drupal acts as a content repository that gatsby pulls from when building. Drupal is also configured through a custom module to connect to the k8s api and patch the deployment gatsby runs under. Gatsby doesn't run persistently, instead this deployment uses gatsby as an init container to build the site so that it can then be served by a nginx container. By patching the deployment(modifying a label) a new replicaset is created which forces a new gatsby build, ultimately replacing the old build.
This seems to work well and I'm reasonably happy with it except for one aspect. There is currently an issue with the default scaling behaviour of replica sets when it comes to multiple subsequent content edits. When you make a subsequent content edit within drupal it will still contact the k8s api and patch the deployment. This results in a new replicaset being created, the original replicaset being left as is, the previous replicaset being scaled down and any pods that are currently being created(gatsby building) are killed. I can see why this is probably desirable in most situations but for me this increases the amount of time that it takes for you to be able to see these changes on the site. If multiple people are using drupal at the same time making edits this will be compounded and could become problematic.
Ideally I would like the containers that are currently building to be able to complete and for those replicasets to finish scaling up, queuing another replicaset to be created once this is completed. This would allow any updates in the first build to be deployed asap, whilst queueing up another build immediately after to include any subsequent content, and this could continue for as long as the load is there to require it and no longer. Is there any way to accomplish this?
It is the regular behavior of Kubernetes. When you update a Deployment it creates new ReplicaSet and respectively a Pod according to new settings. Kubernetes keeps some old ReplicatSets in case of possible roll-backs.
If I understand your question correctly. You cannot change this behavior, so you need to do something with architecture of your application.