Orphan replicasets when running "kubectl apply" with a new image tag - kubernetes

My deployment yml file tags my image with the build version.
So everytime I run kubectl apply from my release pipeline, it pulls the image and deploys it properly.
My question is about the replicaset: when I run kubectl get all, I see orphan replicasets from the pods that were terminated from the previous images. (At least, that's my understanding.) The desired, current and ready properties of these orphan replicasets are 0.
Will this lead to some sort of memory leak? Should I run any other command before kubectl apply?

When you upgrade your deployments from version 1 to version 2, the Deployment creates a new ReplicaSet and increases the count of replicas while the previous count goes to 0. Details here
If you try to execute another rolling update from version 2 to version 3, you might notice that at the end of the upgrade, you have two ReplicaSets with a count of 0.
How this benefit us?
Imagine that current version of the pod introduces any problem and you might want to rollback to previous version. If you have old ReplicaSet, you could scale current to 0 and increment the old ReplicaSet count. See how Rolling Back to a Previous Revision.
By default Kubernetes stores the last 10 ReplicaSets and lets you rollback to any of them. But you can change this by changing the spec.revisionHistoryLimit in your Deployment. Ref: Clean up Policy
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
revisionHistoryLimit: 3
...

Related

Helm --force option

I read in a book written by Helm creators the following fact about the --force option :
Sometimes, though, Helm users want to make sure that the pods are restarted. That’s
where the --force flag comes in. Instead of modifying the Deployment (or similar
object), it will delete and re-create it. This forces Kubernetes to delete the old pods
and create new ones.
What I understand from that is, if I install a chart and then I change the number of replicas (=number of pods) then I upgrade the chart, it should recreate all the pods. This is not what happens in my case and I wanted to understand what I am missing here.
Let's take a hypothetical minimal Deployment (many required details omitted):
spec:
replicas: 3
template:
spec:
containers:
- image: abc:123
and you change this to only increase the replica count
spec:
replicas: 5 # <-- this is the only change
template:
spec:
containers:
- image: abc:123
The Kubernetes Deployment controller looks at this change and says "I already have 3 Pods running abc:123; if I leave those alone, and start 2 more, then I will have 5, and the system will look like what the Deployment spec requests". So absent any change to the embedded Pod spec, the existing Pods will be left alone and the cluster will just scale up.
deployment-12345-aaaaa deployment-12345-aaaaa
deployment-12345-bbbbb deployment-12345-bbbbb
deployment-12345-ccccc ---> deployment-12345-ccccc
deployment-12345-ddddd
deployment-12345-eeeee
(replicas: 3) (replicas: 5)
Usually this is fine, since you're running the same image version and the same code. If you do need to forcibly restart things, I'd suggest using kubectl rollout restart deployment/its-name rather than trying to convince Helm to do it.

Deleting deployment leaves trailing replicasets and pods

I am running Kubernetes in GCP and since updating few months ago (now I am running 1.17.13-gke.2600) I am observing trailing replicasets and pods after deployment deletion. Consider state before deletion:
$ k get deployment | grep parser
parser-devel 1/1 1 1 38d
$ k get replicaset | grep parser
parser-devel-66bfc86ddb 0 0 0 27m
parser-devel-77898d9b9d 1 1 1 5m49s
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 6m2s
Then I delete the deployment:
$ k delete deployment parser-devel
deployment.apps "parser-devel" deleted
$ k get replicaset | grep parser
parser-devel-66bfc86ddb 0 0 0 28m
parser-devel-77898d9b9d 1 1 1 7m1s
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 7m6s
Then I try to delete the replicasets:
$ k delete replicaset parser-devel-66bfc86ddb parser-devel-77898d9b9d
replicaset.apps "parser-devel-66bfc86ddb" deleted
replicaset.apps "parser-devel-77898d9b9d" deleted
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 8m14s
As far as I understand Kubernetes, this is not a correct behaviour, so why it is happening?
How about check ownerReference of the ReplicaSet created by you Deployment ? Refer Owners and dependents for more details. For example, for removing dependencies of the Deployment, the Deployment name and uid should be matched exactly in the ownerReference ones. Or I have experience that similar issue happened, if Kuerbetenes API was somthing wrong. So API service restart may help to resovle it.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
...
ownerReferences:
- apiVersion: apps/v1
controller: true
blockOwnerDeletion: true
kind: Deployment
name: your-deployment
uid: xxx
The trailing ReplicaSets that you can see after deployment deletion depends of the Revision History Limit that you have in your Deployment.
.spec.revisionHistoryLimit is an optional field that specifies the number of old ReplicaSets to retain to allow rollback.
By default, 10 old ReplicaSets will be kept.
You could see the number of ReplicaSets with the following command:
kubectl get deployment DEPLOYMENT -o yaml | grep revisionHistoryLimit
But you can modify this value with:
kubectl edit deployment DEPLOYMENT
Edit 1
I created a GKE cluster on the same version (1.17.13-gke.2600) in order to know if it is deleting trailing resources when I delete the parent object (Deployment).
With testing purposes, I created an nginx Deployment and then deleted it with kubectl delete Deployment DEPLOYMENT_NAME, Deployment and all its dependents (Pods created and Replicasets) were deleted.
Then I tested it again, but this time by adding kubectl flag --cascade=false like kubectl delete Deployment DEPLOYMENT_NAME --cascade=false and all the dependent resources remained but the Deployment. With this situation (leaving orphaned resources) kube-controller manager (Specifically garbage collector) should delete these resources soon or later.
With the tests I made, seems that GKE version is OK as I was able to delete the trailed resources made by the Deployment object I created since my first test.
Cascade option is set by default as true with different and several command verbs like delete, you also could check this other documentation. Even so, I would like to know if you can create a Deployment, and then try to delete it with command kubectl delete Deployment DEPLOYMENT_NAME --cascade=true in order to know if by trying to force cascade deletion helps on this case.

Changing image of kubernetes job

I'm working on the manifest of a kubernetes job.
apiVersion: batch/v1
kind: Job
metadata:
name: hello-job
spec:
template:
spec:
containers:
- name: hello
image: hello-image:latest
I then apply the manifest using kubectl apply -f <deployment.yaml> and the job runs without any issue.
The problem comes when i change the image of the running container from latest to something else.
At that point i get a field is immutable exception on applying the manifest.
I get the same exception either if the job is running or completed. The only workaround i found so far is to delete manually the job before applying the new manifest.
How can i update the current job without having to manually delete it first?
I guess you are probably using an incorrect kubernetes resource . Job is a immutable Pod that runs to completion , you cannot update it . As per Kubernetes documentation ..
Say Job old is already running. You want existing Pods to keep running, but you want the rest of the Pods it creates to use a different pod template and for the Job to have a new name. You cannot update the Job because these fields are not updatable. Therefore, you delete Job old but leave its pods running, using kubectl delete jobs/old --cascade=false.
If you intend to update an image you should either use Deployment or Replication controller which supports updates

kubectl replace -f creates pod indefinitely in Pending state

I have a k8s deployment - I often deploy a new version to the docker repo - change the image tag - and try to replace the deployment using kubectl replace -f file.yaml. My replicas are set to 1 - I have only 1 pod of the deployment running at a time.
When I change the image tag (e.g changing v1 to v2) and try to replace it - it creates a new pod, but it remains in the 'pending' state indefinitely, while the old pod stays in 'Running' state.
I think the new pod waits for the old pod to be terminated - but it won't terminate by itself. I need it to be deleted by k8s so the new pod can take its place.
Using replace --force fixes this issue - but I'd like it to work using just replace -f. Any ideas how to achieve this?
The issue you see has nothing to do with kubectl replace/apply. The real reason is that deployments by default use RollingUpdate strategy which by default waits for new pod to be Running and only then kills old pod. The reason why new pod is in Pending state is unclear from your question but in most cases this indicates lack of compute resources for new pod.
You may do two different things:
Use RollingUpdate strategy with maxUnavailable=1. This will do what you want - it will kill old pod and then create a new one.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
OR you can specify Recreate strategy which effectively does the same:
spec:
strategy:
type: Recreate
Read more about deployment strategies here: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

How to force Kubernetes to update deployment with a pod in every node

I would like to know if there is a way to force Kubernetes, during a deploy, to use every node in the cluster.
The question is due some attempts that I have done where I noticed a situation like this:
a cluster of 3 nodes
I update a deployment with a command like: kubectl set image deployment/deployment_name my_repo:v2.1.2
Kubernetes updates the cluster
At the end I execute kubectl get pod and I notice that 2 pods have been deployed in the same node.
So after the update, the cluster has this configuration:
one node with 2 pods
one node with 1 pod
one node without any pod (totally without any workload)
The scheduler will try to figure out the most reasonable way of scheduling at given point in time, which can change later on and results in situations like you described. Two simple ways to manage this in one way or another are :
use DaemonSet instead of Deployment : will make sure you have one and only one pod per node (matching nodeSelector / tolerations etc.)
use PodAntiAffinity : you can make sure that two pods of the same deployment in the same version are never deployed on the same node. This is what I personally prefer for many apps (unless I want more then one to be scheduled per node). Note that it will be in a bit of trouble if you decide to scale your deployment to more replicas then you have nodes.
Example for versioned PodAntiAffinity I use :
metadata:
labels:
app: {{ template "fullname" . }}
version: {{ .Values.image.tag }}
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["{{ template "fullname" . }}"]
- key: version
operator: In
values: ["{{ .Values.image.tag }}"]
topologyKey: kubernetes.io/hostname
consider fiddling with Descheduler which is like an evil twin of Kubes Scheduler component which will cause deleting of pods for them tu reschedule differently
I tried some solutions and what is working at the moment is simply based on the change of version inside my deployment.yaml on DaemonSet controller.
I mean:
1) I have to deploy for the 1' time my application based on a pod with some containers. These pods should be deployed on every cluster node (I have 3 nodes). I have set up the deployment setting in the yaml file with the option replicas equal to 3:
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: my-deployment
labels:
app: webpod
spec:
replicas: 3
....
I have set up the daemonset (or ds) in the yaml file with the option updateStrategy equal to RollingUpdate:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
updateStrategy:
type: RollingUpdate
...
The version used for one of my containers is 2.1 for example
2) I execute the deployment with the command: kubectl apply -f my-deployment.yaml
I execute the deployment with the command: kubectl apply -f my-daemonset.yaml
3) I get one pod for every node without problem
4) Now I want to update the deployment changing the version of the image that I use for one of my containers. So I simply change the yaml file editing 2.1 with 2.2. Then I re-launch the command: kubectl apply -f my-deployment.yaml
So I can simply change the version of the image (2.1 -> 2.2) with this command:
kubectl set image ds/my-daemonset my-container=my-repository:v2.2
5) Again, I obtain one pod for every node without problem
Behavior very different if instead I use the command:
kubectl set image deployment/my-deployment my-container=xxxx:v2.2
In this case I get a wrong result where a node has 2 pod, a node 1 pod and last node without any pod...
To see how the deployment evolves, I can launch the command:
kubectl rollout status ds/my-daemonset
getting something like that
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 of 3 updated pods are available...
daemon set "my-daemonset" successfully rolled out