kubectl replace -f creates pod indefinitely in Pending state - kubernetes

I have a k8s deployment - I often deploy a new version to the docker repo - change the image tag - and try to replace the deployment using kubectl replace -f file.yaml. My replicas are set to 1 - I have only 1 pod of the deployment running at a time.
When I change the image tag (e.g changing v1 to v2) and try to replace it - it creates a new pod, but it remains in the 'pending' state indefinitely, while the old pod stays in 'Running' state.
I think the new pod waits for the old pod to be terminated - but it won't terminate by itself. I need it to be deleted by k8s so the new pod can take its place.
Using replace --force fixes this issue - but I'd like it to work using just replace -f. Any ideas how to achieve this?

The issue you see has nothing to do with kubectl replace/apply. The real reason is that deployments by default use RollingUpdate strategy which by default waits for new pod to be Running and only then kills old pod. The reason why new pod is in Pending state is unclear from your question but in most cases this indicates lack of compute resources for new pod.
You may do two different things:
Use RollingUpdate strategy with maxUnavailable=1. This will do what you want - it will kill old pod and then create a new one.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
OR you can specify Recreate strategy which effectively does the same:
spec:
strategy:
type: Recreate
Read more about deployment strategies here: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

Related

How rollout undo works during rollingUpdate strategy when deployment has crossed 'progressDeadlineSeconds'?

I have created a kubernetes deployment with below specs:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Algorithm of the rollout and undo that I'm experimenting:
kubectl apply -f deployment.yaml
if(kubectl rollout status deployment != 0)
{
kubectl rollout undo deployment
}
I'm trying to test the failure case.
Initially we have 2 pods old-pod-1 and old-pod-2. We initiate a rolling deployment with above configuration to maintain at least 2 pods at all times. So, when we initiate the deployment, an additional pod is created with the new code (new-pod-1). Let us say this succeeds. At this time, One old pod is brought down(old-pod-1 is down). Now if the 2nd new pod deployment fails, what would happen?
Adding some background:
In a test where new-pod-1 creation is unsuccessful, after progressDeadlineSeconds new-pod-1 is killed and after rollout undo we will be left with old-pod-1 and old-pod-2.
But in the above case, old-pod-1 is already down. thus, what is the end state of the pods? Will we have a pod created with old deployment version?

How to scale up all OpenShift pods before scaling down old ones

I have a basic OpenShift deployment configuration:
kind: DeploymentConfig
spec:
replicas: 3
strategy:
type: Rolling
Additionaly I've put:
maxSurge: 3
maxUnavailable: 0%
because I want to scale up all new pods first and after that scale down old pods (so there will be 6 pods running during deploymentm that's why I decided to set up maxSurge).
I want to have all old pods running until all new pods are up but with this set of parameters there is something wrong. During deployment:
all 3 new pods are initialized at once and are trying to start, old pods are running (as expected)
if first new pod started sucessfully then the old one is terminated
if second new pod is ready then another old pod is terminated
I want to terminate all old pods ONLY if all new pods are ready to handle requests, otherwise all the old pods should handle requests.
What did I miss in this confgiuration?
The behavior you document is expected for a deployment rollout (that OpenShift will shut down each old pod as a new pod becomes ready). It will also start routing traffic to the new nodes as they become available, which you say that you don't want either.
A service is pretty much by definition going to route to pods as they are available. And a deployment pretty much handles pods independently, so I don't believe that anything will really give you the behavior you are looking for there either.
If you want a blue green style deployment like you describe, you are essentially going to have deploy the new pods as a separate deployment. Then once the new deployment is completely up, you can change the corresponding service to point at the new pods. Then you can shut down the old deployment.
Service Mesh can help with some of that. So could an operator. Or you could do it manually.
You can combine the rollout strategy with readiness checks with an initial delay to ensure that all the new pods have time to start up before the old ones are all shut down at the same time.
In the case below, the new 3 pods will be spun up (for a total of 6 pods) and then after 60 seconds, the readiness check will occur and the old pods will be shut down. You would just want to adjust your readiness delay to a large enough timeframe to give all of your new pods time to start up.
apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 3
strategy:
rollingParams:
maxSurge: 3
maxUnavailable: 0
type: Rolling
template:
spec:
containers:
- readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8099
initialDelaySeconds: 60

Orphan replicasets when running "kubectl apply" with a new image tag

My deployment yml file tags my image with the build version.
So everytime I run kubectl apply from my release pipeline, it pulls the image and deploys it properly.
My question is about the replicaset: when I run kubectl get all, I see orphan replicasets from the pods that were terminated from the previous images. (At least, that's my understanding.) The desired, current and ready properties of these orphan replicasets are 0.
Will this lead to some sort of memory leak? Should I run any other command before kubectl apply?
When you upgrade your deployments from version 1 to version 2, the Deployment creates a new ReplicaSet and increases the count of replicas while the previous count goes to 0. Details here
If you try to execute another rolling update from version 2 to version 3, you might notice that at the end of the upgrade, you have two ReplicaSets with a count of 0.
How this benefit us?
Imagine that current version of the pod introduces any problem and you might want to rollback to previous version. If you have old ReplicaSet, you could scale current to 0 and increment the old ReplicaSet count. See how Rolling Back to a Previous Revision.
By default Kubernetes stores the last 10 ReplicaSets and lets you rollback to any of them. But you can change this by changing the spec.revisionHistoryLimit in your Deployment. Ref: Clean up Policy
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
revisionHistoryLimit: 3
...

Gitlab Autodevops How to always keep one pod alive

I'm using Gitlab Autodevops to deploy app on my kubernetes cluster. That app should always have only one instance running.
Problem is, during the update process, Helm kills currently running pod before the new pod is ready. This causes downtime period, when old version is already killed and new one isn't ready yet. To make it worse, app need significant time to start (2+ minutes).
I have tried to set minAvailable: 1 in PodDisruptionBudget, but no help.
Any idea how can i tell helm to wait for readiness of updated pod before killing old one? (Having 2 instances running simultaneously for several second is not such a problem for me)
You can release a new application version in few ways, it's necessary to choose the one that fit your needs.
I would recommend one of the following:
Ramped - slow rollout
A ramped deployment updates pods in a rolling update fashion, a secondary ReplicaSet is created with the new version of the application, then the number of replicas of the old version is decreased and the new version is increased until the correct number of replicas is reached.
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # how many pods we can add at a time
maxUnavailable: 0 # maxUnavailable define how many pods can be unavailable
# during the rolling update
Full example and steps can be found here.
Blue/Green - best to avoid API versioning issues
A blue/green deployment differs from a ramped deployment because the “green” version of the application is deployed alongside the “blue” version. After testing that the new version meets the requirements, we update the Kubernetes Service object that plays the role of load balancer to send traffic to the new version by replacing the version label in the selector field.
apiVersion: v1
kind: Service
metadata:
name: my-app
labels:
app: my-app
spec:
type: NodePort
ports:
- name: http
port: 8080
targetPort: 8080
# Note here that we match both the app and the version.
# When switching traffic, we update the label “version” with
# the appropriate value, ie: v2.0.0
selector:
app: my-app
version: v1.0.0
Full example and steps can be found here.
Canary - for testing
A canary deployment consists of routing a subset of users to a new functionality. In Kubernetes, a canary deployment can be done using two Deployments with common pod labels. One replica of the new version is released alongside the old version. Then after some time and if no error is detected, scale up the number of replicas of the new version and delete the old deployment.
Using this ReplicaSet technique requires spinning-up as many pods as necessary to get the right percentage of traffic. That said, if you want to send 1% of traffic to version B, you need to have one pod running with version B and 99 pods running with version A. This can be pretty inconvenient to manage so if you are looking for a better managed traffic distribution, look at load balancers such as HAProxy or service meshes like Linkerd, which provide greater controls over traffic.
Manifest for version A:
spec:
replicas: 3
Manifest for version B:
spec:
replicas: 1
Full example and steps can be found here.
You can also play with Interactive Tutorial - Updating Your App on Kubernetes.
I recommend reading Deploy, Scale And Upgrade An Application On Kubernetes With Helm.

How to force Kubernetes to update deployment with a pod in every node

I would like to know if there is a way to force Kubernetes, during a deploy, to use every node in the cluster.
The question is due some attempts that I have done where I noticed a situation like this:
a cluster of 3 nodes
I update a deployment with a command like: kubectl set image deployment/deployment_name my_repo:v2.1.2
Kubernetes updates the cluster
At the end I execute kubectl get pod and I notice that 2 pods have been deployed in the same node.
So after the update, the cluster has this configuration:
one node with 2 pods
one node with 1 pod
one node without any pod (totally without any workload)
The scheduler will try to figure out the most reasonable way of scheduling at given point in time, which can change later on and results in situations like you described. Two simple ways to manage this in one way or another are :
use DaemonSet instead of Deployment : will make sure you have one and only one pod per node (matching nodeSelector / tolerations etc.)
use PodAntiAffinity : you can make sure that two pods of the same deployment in the same version are never deployed on the same node. This is what I personally prefer for many apps (unless I want more then one to be scheduled per node). Note that it will be in a bit of trouble if you decide to scale your deployment to more replicas then you have nodes.
Example for versioned PodAntiAffinity I use :
metadata:
labels:
app: {{ template "fullname" . }}
version: {{ .Values.image.tag }}
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["{{ template "fullname" . }}"]
- key: version
operator: In
values: ["{{ .Values.image.tag }}"]
topologyKey: kubernetes.io/hostname
consider fiddling with Descheduler which is like an evil twin of Kubes Scheduler component which will cause deleting of pods for them tu reschedule differently
I tried some solutions and what is working at the moment is simply based on the change of version inside my deployment.yaml on DaemonSet controller.
I mean:
1) I have to deploy for the 1' time my application based on a pod with some containers. These pods should be deployed on every cluster node (I have 3 nodes). I have set up the deployment setting in the yaml file with the option replicas equal to 3:
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: my-deployment
labels:
app: webpod
spec:
replicas: 3
....
I have set up the daemonset (or ds) in the yaml file with the option updateStrategy equal to RollingUpdate:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
updateStrategy:
type: RollingUpdate
...
The version used for one of my containers is 2.1 for example
2) I execute the deployment with the command: kubectl apply -f my-deployment.yaml
I execute the deployment with the command: kubectl apply -f my-daemonset.yaml
3) I get one pod for every node without problem
4) Now I want to update the deployment changing the version of the image that I use for one of my containers. So I simply change the yaml file editing 2.1 with 2.2. Then I re-launch the command: kubectl apply -f my-deployment.yaml
So I can simply change the version of the image (2.1 -> 2.2) with this command:
kubectl set image ds/my-daemonset my-container=my-repository:v2.2
5) Again, I obtain one pod for every node without problem
Behavior very different if instead I use the command:
kubectl set image deployment/my-deployment my-container=xxxx:v2.2
In this case I get a wrong result where a node has 2 pod, a node 1 pod and last node without any pod...
To see how the deployment evolves, I can launch the command:
kubectl rollout status ds/my-daemonset
getting something like that
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 of 3 updated pods are available...
daemon set "my-daemonset" successfully rolled out