Deleting deployment leaves trailing replicasets and pods - kubernetes

I am running Kubernetes in GCP and since updating few months ago (now I am running 1.17.13-gke.2600) I am observing trailing replicasets and pods after deployment deletion. Consider state before deletion:
$ k get deployment | grep parser
parser-devel 1/1 1 1 38d
$ k get replicaset | grep parser
parser-devel-66bfc86ddb 0 0 0 27m
parser-devel-77898d9b9d 1 1 1 5m49s
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 6m2s
Then I delete the deployment:
$ k delete deployment parser-devel
deployment.apps "parser-devel" deleted
$ k get replicaset | grep parser
parser-devel-66bfc86ddb 0 0 0 28m
parser-devel-77898d9b9d 1 1 1 7m1s
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 7m6s
Then I try to delete the replicasets:
$ k delete replicaset parser-devel-66bfc86ddb parser-devel-77898d9b9d
replicaset.apps "parser-devel-66bfc86ddb" deleted
replicaset.apps "parser-devel-77898d9b9d" deleted
$ k get pod | grep parser
parser-devel-77898d9b9d-4w48w 1/1 Running 0 8m14s
As far as I understand Kubernetes, this is not a correct behaviour, so why it is happening?

How about check ownerReference of the ReplicaSet created by you Deployment ? Refer Owners and dependents for more details. For example, for removing dependencies of the Deployment, the Deployment name and uid should be matched exactly in the ownerReference ones. Or I have experience that similar issue happened, if Kuerbetenes API was somthing wrong. So API service restart may help to resovle it.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
...
ownerReferences:
- apiVersion: apps/v1
controller: true
blockOwnerDeletion: true
kind: Deployment
name: your-deployment
uid: xxx

The trailing ReplicaSets that you can see after deployment deletion depends of the Revision History Limit that you have in your Deployment.
.spec.revisionHistoryLimit is an optional field that specifies the number of old ReplicaSets to retain to allow rollback.
By default, 10 old ReplicaSets will be kept.
You could see the number of ReplicaSets with the following command:
kubectl get deployment DEPLOYMENT -o yaml | grep revisionHistoryLimit
But you can modify this value with:
kubectl edit deployment DEPLOYMENT
Edit 1
I created a GKE cluster on the same version (1.17.13-gke.2600) in order to know if it is deleting trailing resources when I delete the parent object (Deployment).
With testing purposes, I created an nginx Deployment and then deleted it with kubectl delete Deployment DEPLOYMENT_NAME, Deployment and all its dependents (Pods created and Replicasets) were deleted.
Then I tested it again, but this time by adding kubectl flag --cascade=false like kubectl delete Deployment DEPLOYMENT_NAME --cascade=false and all the dependent resources remained but the Deployment. With this situation (leaving orphaned resources) kube-controller manager (Specifically garbage collector) should delete these resources soon or later.
With the tests I made, seems that GKE version is OK as I was able to delete the trailed resources made by the Deployment object I created since my first test.
Cascade option is set by default as true with different and several command verbs like delete, you also could check this other documentation. Even so, I would like to know if you can create a Deployment, and then try to delete it with command kubectl delete Deployment DEPLOYMENT_NAME --cascade=true in order to know if by trying to force cascade deletion helps on this case.

Related

Kubernetes resource quota, have non schedulable pod staying in pending state

So I wish to limit resources used by pod running for each of my namespace, and therefor want to use resource quota.
I am following this tutorial.
It works well, but I wish something a little different.
When trying to schedule a pod which will go over the limit of my quota, I am getting a 403 error.
What I wish is the request to be scheduled, but waiting in a pending state until one of the other pod end and free some resources.
Any advice?
Instead of using straight pod definitions (kind: Pod) use deployment.
Why?
Pods in Kubernetes are designed as relatively ephemeral, disposable entities:
You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a controller), the new Pod is scheduled to run on a Node in your cluster. The Pod remains on that node until the Pod finishes execution, the Pod object is deleted, the Pod is evicted for lack of resources, or the node fails.
Kubernetes assumes that for managing pods you should a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
By using deployment you will get very similar behaviour to the one you want.
Example below:
Let's suppose that I created pod quota for a custom namespace, set to "2" as in this example and I have two pods running in this namespace:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 75s
quota-demo-2 1/1 Running 0 6s
Third pod definition:
apiVersion: v1
kind: Pod
metadata:
name: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
Now I will try to apply this third pod in this namespace:
kubectl apply -f pod.yaml -n quota-demo
Error from server (Forbidden): error when creating "pod.yaml": pods "quota-demo-3" is forbidden: exceeded quota: pod-demo, requested: pods=1, used: pods=2, limited: pods=2
Not working as expected.
Now I will change pod definition into deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: quota-demo-3-deployment
labels:
app: quota-demo-3
spec:
selector:
matchLabels:
app: quota-demo-3
template:
metadata:
labels:
app: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
I will apply this deployment:
kubectl apply -f deployment-v3.yaml -n quota-demo
deployment.apps/quota-demo-3-deployment created
Deployment is created successfully, but there is no new pod, Let's check this deployment:
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 0/1 0 0 12s
We can see that a pod quota is working, deployment is monitoring resources and waiting for the possibility to create a new pod.
Let's now delete one of the pod and check deployment again:
kubectl delete pod quota-demo-2 -n quota-demo
pod "quota-demo-2" deleted
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 1/1 1 1 2m50s
The pod from the deployment is created automatically after deletion of the pod:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 5m51s
quota-demo-3-deployment-7fd6ddcb69-nfmdj 1/1 Running 0 29s
It works the same way for memory and CPU quotas for namespace - when the resources are free, deployment will automatically create new pods.

Helm 3 Deployment Order of Kubernetes Service Catalog Resources

I am using Helm v3.3.0, with a Kubernetes 1.16.
The cluster has the Kubernetes Service Catalog installed, so external services implementing the Open Service Broker API spec can be instantiated as K8S resources - as ServiceInstances and ServiceBindings.
ServiceBindings reflect as K8S Secrets and contain the binding information of the created external service. These secrets are usually mapped into the Docker containers as environment variables or volumes in a K8S Deployment.
Now I am using Helm to deploy my Kubernetes resources, and I read here that...
The [Helm] install order of Kubernetes types is given by the enumeration InstallOrder in kind_sorter.go
In that file, the order does neither mention ServiceInstance nor ServiceBinding as resources, and that would mean that Helm installs these resource types after it has installed any of its InstallOrder list - in particular Deployments. That seems to match the output of helm install --dry-run --debug run on my chart, where the order indicates that the K8S Service Catalog resources are applied last.
Question: What I cannot understand is, why my Deployment does not fail to install with Helm.
After all my Deployment resource seems to be deployed before the ServiceBinding is. And it is the Secret generated out of the ServiceBinding that my Deployment references. I would expect it to fail, since the Secret is not there yet, when the Deployment is getting installed. But that is not the case.
Is that just a timing glitch / lucky coincidence, or is this something I can rely on, and why?
Thanks!
As said in the comment I posted:
In fact your Deployment is failing at the start with Status: CreateContainerConfigError. Your Deployment is created before Secret from the ServiceBinding. It's only working as it was recreated when the Secret from ServiceBinding was available.
I wanted to give more insight with example of why the Deployment didn't fail.
What is happening (simplified in order):
Deployment -> created and spawned a Pod
Pod -> failing pod with status: CreateContainerConfigError by lack of Secret
ServiceBinding -> created Secret in a background
Pod gets the required Secret and starts
Previously mentioned InstallOrder will leave ServiceInstace and ServiceBinding as last by comment on line 147.
Example
Assuming that:
There is a working Kubernetes cluster
Helm3 installed and ready to use
Following guides:
Kubernetes.io: Instal Service Catalog using Helm
Magalix.com: Blog: Kubernetes Service Catalog
There is a Helm chart with following files in templates/ directory:
ServiceInstance
ServiceBinding
Deployment
Files:
ServiceInstance.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
name: example-instance
spec:
clusterServiceClassExternalName: redis
clusterServicePlanExternalName: 5-0-4
ServiceBinding.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
name: example-binding
spec:
instanceRef:
name: example-instance
Deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
spec:
selector:
matchLabels:
app: ubuntu
replicas: 1
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu
command:
- sleep
- "infinity"
# part below responsible for getting secret as env variable
env:
- name: DATA
valueFrom:
secretKeyRef:
name: example-binding
key: host
Applying above resources to check what is happening can be done in 2 ways:
First method is to use timestamp from $ kubectl get RESOURCE -o yaml
Second method is to use $ kubectl get RESOURCE --watch-only=true
First method
As said previously the Pod from the Deployment couldn't start as Secret was not available when the Pod tried to spawn. After the Secret was available to use, the Pod started.
The statuses this Pod had were the following:
Pending
ContainerCreating
CreateContainerConfigError
Running
This is a table with timestamps of Pod and Secret:
| Pod | Secret |
|-------------------------------------------|-------------------------------------------|
| creationTimestamp: "2020-08-23T19:54:47Z" | - |
| - | creationTimestamp: "2020-08-23T19:54:55Z" |
| startedAt: "2020-08-23T19:55:08Z" | - |
You can get this timestamp by invoking below commands:
$ kubectl get pod pod_name -n namespace -o yaml
$ kubectl get secret secret_name -n namespace -o yaml
You can also get get additional information with:
$ kubectl get event -n namespace
$ kubectl describe pod pod_name -n namespace
Second method
This method requires preparation before running Helm chart. Open another terminal window (for this particular case 2) and run:
$ kubectl get pod -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
$ kubectl get secret -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
After that apply your Helm chart.
Disclaimer!
Above commands will watch for changes in resources and display them with a timestamp from the OS. Please remember that this command is only for example purposes.
The output for Pod:
21:54:47:534823000 NAME READY STATUS RESTARTS AGE
21:54:47:542107000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:553799000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:655593000 ubuntu-65976bb789-l48wz 0/1 ContainerCreating 0 0s
-> 21:54:52:001347000 ubuntu-65976bb789-l48wz 0/1 CreateContainerConfigError 0 4s
21:55:09:205265000 ubuntu-65976bb789-l48wz 1/1 Running 0 22s
The output for Secret:
21:54:47:385714000 NAME TYPE DATA AGE
21:54:47:393145000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:47:719864000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:51:182609000 understood-squid-redis Opaque 1 0s
21:54:52:001031000 understood-squid-redis Opaque 1 0s
-> 21:54:55:686461000 example-binding Opaque 6 0s
Additional resources:
Stackoverflow.com: Answer: Helm install in certain order
Alibabacloud.com: Helm charts and templates hooks and tests part 3
So to answer my own question (and thanks to #dawid-kruk and the folks on Service Catalog Sig on Slack):
In fact, the initial start of my Pods (the ones referencing the Secret created out of the ServiceBinding) fails! It fails because the Secret is actually not there the moment K8S tries to start the pods.
Kubernetes has a self-healing mechanism, in the sense that it tries (and retries) to reach the target state of the cluster as described by the various deployed resources.
By Kubernetes retrying to get the pods running, eventually (when the Secret is finally there) all conditions will be satisfied to make the pods start up nicely. Therefore, eventually, evth. is running as it should.
How could this be streamlined? One possibility would be for Helm to include the custom resources ServiceBinding and ServiceInstance into its ordered list of installable resources and install them early in the installation phase.
But even without that, Kubernetes actually deals with it just fine. The order of installation (in this case) really does not matter. And that is a good thing!

How to force Kubernetes to update deployment with a pod in every node

I would like to know if there is a way to force Kubernetes, during a deploy, to use every node in the cluster.
The question is due some attempts that I have done where I noticed a situation like this:
a cluster of 3 nodes
I update a deployment with a command like: kubectl set image deployment/deployment_name my_repo:v2.1.2
Kubernetes updates the cluster
At the end I execute kubectl get pod and I notice that 2 pods have been deployed in the same node.
So after the update, the cluster has this configuration:
one node with 2 pods
one node with 1 pod
one node without any pod (totally without any workload)
The scheduler will try to figure out the most reasonable way of scheduling at given point in time, which can change later on and results in situations like you described. Two simple ways to manage this in one way or another are :
use DaemonSet instead of Deployment : will make sure you have one and only one pod per node (matching nodeSelector / tolerations etc.)
use PodAntiAffinity : you can make sure that two pods of the same deployment in the same version are never deployed on the same node. This is what I personally prefer for many apps (unless I want more then one to be scheduled per node). Note that it will be in a bit of trouble if you decide to scale your deployment to more replicas then you have nodes.
Example for versioned PodAntiAffinity I use :
metadata:
labels:
app: {{ template "fullname" . }}
version: {{ .Values.image.tag }}
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["{{ template "fullname" . }}"]
- key: version
operator: In
values: ["{{ .Values.image.tag }}"]
topologyKey: kubernetes.io/hostname
consider fiddling with Descheduler which is like an evil twin of Kubes Scheduler component which will cause deleting of pods for them tu reschedule differently
I tried some solutions and what is working at the moment is simply based on the change of version inside my deployment.yaml on DaemonSet controller.
I mean:
1) I have to deploy for the 1' time my application based on a pod with some containers. These pods should be deployed on every cluster node (I have 3 nodes). I have set up the deployment setting in the yaml file with the option replicas equal to 3:
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: my-deployment
labels:
app: webpod
spec:
replicas: 3
....
I have set up the daemonset (or ds) in the yaml file with the option updateStrategy equal to RollingUpdate:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: my-daemonset
spec:
updateStrategy:
type: RollingUpdate
...
The version used for one of my containers is 2.1 for example
2) I execute the deployment with the command: kubectl apply -f my-deployment.yaml
I execute the deployment with the command: kubectl apply -f my-daemonset.yaml
3) I get one pod for every node without problem
4) Now I want to update the deployment changing the version of the image that I use for one of my containers. So I simply change the yaml file editing 2.1 with 2.2. Then I re-launch the command: kubectl apply -f my-deployment.yaml
So I can simply change the version of the image (2.1 -> 2.2) with this command:
kubectl set image ds/my-daemonset my-container=my-repository:v2.2
5) Again, I obtain one pod for every node without problem
Behavior very different if instead I use the command:
kubectl set image deployment/my-deployment my-container=xxxx:v2.2
In this case I get a wrong result where a node has 2 pod, a node 1 pod and last node without any pod...
To see how the deployment evolves, I can launch the command:
kubectl rollout status ds/my-daemonset
getting something like that
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 0 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 1 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 out of 3 new pods have been updated...
Waiting for rollout to finish: 2 of 3 updated pods are available...
daemon set "my-daemonset" successfully rolled out

fluentd-es-v1.22 DaemonSet doesn't create any pod

I went through both daemonset doesn't create any pods and DaemonSet doesn't create any pods: v1.1.2 before asking this question. Here is my problem.
Kubernetes cluster is running on CoreOS
NAME=CoreOS
ID=coreos
VERSION=1185.3.0
VERSION_ID=1185.3.0
BUILD_ID=2016-11-01-0605
PRETTY_NAME="CoreOS 1185.3.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
I refer to https://coreos.com/kubernetes/docs/latest/getting-started.html guide and created 3 etcd, 2 masters and 42 nodes. All applications running in the cluster without issue.
I got a requirement of setting up logging with fluentd-elasticsearch and downloaded yaml files in https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch deployed fluentd deamonset.
kubectl create -f fluentd-es-ds.yaml
I could see it got created but none of pod created.
kubectl --namespace=kube-system get ds -o wide
NAME DESIRED CURRENT NODE-SELECTOR AGE CONTAINER(S) IMAGE(S) SELECTOR
fluentd-es-v1.22 0 0 alpha.kubernetes.io/fluentd-ds-ready=true 4h fluentd-es gcr.io/google_containers/fluentd-elasticsearch:1.22 k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
kubectl --namespace=kube-system describe ds fluentd-es-v1.22
Name: fluentd-es-v1.22
Image(s): gcr.io/google_containers/fluentd-elasticsearch:1.22
Selector: k8s-app=fluentd-es,kubernetes.io/cluster-service=true,version=v1.22
Node-Selector: alpha.kubernetes.io/fluentd-ds-ready=true
Labels: k8s-app=fluentd-es
kubernetes.io/cluster-service=true
version=v1.22
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No events.
I verified below details according to the comments in above SO questions.
kubectl api-versions
apps/v1alpha1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1beta1
autoscaling/v1
batch/v1
batch/v2alpha1
certificates.k8s.io/v1alpha1
extensions/v1beta1
policy/v1alpha1
rbac.authorization.k8s.io/v1alpha1
storage.k8s.io/v1beta1
v1
I could see below logs in one kube-controller-manager after restart.
I0116 20:48:25.367335 1 controllermanager.go:326] Starting extensions/v1beta1 apis
I0116 20:48:25.367368 1 controllermanager.go:328] Starting horizontal pod controller.
I0116 20:48:25.367795 1 controllermanager.go:343] Starting daemon set controller
I0116 20:48:25.367969 1 horizontal.go:127] Starting HPA Controller
I0116 20:48:25.369795 1 controllermanager.go:350] Starting job controller
I0116 20:48:25.370106 1 daemoncontroller.go:236] Starting Daemon Sets controller manager
I0116 20:48:25.371637 1 controllermanager.go:357] Starting deployment controller
I0116 20:48:25.374243 1 controllermanager.go:364] Starting ReplicaSet controller
The other one has below log.
I0116 23:16:23.033707 1 leaderelection.go:295] lock is held by {master.host.name} and has not yet expired
Am I missing something? Appreciate your help on figure out the issue.
I found the solution after studying https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
There is nodeSelector: set as alpha.kubernetes.io/fluentd-ds-ready: "true"
But nodes doesn't have a label like that. What I did is add the label as below to one node to check whether it's working.
kubectl label nodes {node_name} alpha.kubernetes.io/fluentd-ds-ready="true"
After that, I could see fluentd pod started to run
kubectl --namespace=kube-system get pods
NAME READY STATUS RESTARTS AGE
fluentd-es-v1.22-x1rid 1/1 Running 0 6m
Thanks.

kubernetes hidden replica set?

I'm learning Kubernetes and just came across an issue and would like to check if anyone else has come across it,
user#ubuntu:~/rc$ kubectl get rs ### don’t see any replica set
user#ubuntu:~/rc$
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl get pod
NAME READY STATUS RESTARTS AGE
bigwebstuff-673k9 1/1 Running 0 7m
bigwebstuff-cs7i3 1/1 Running 0 7m
bigwebstuff-egbqd 1/1 Running 0 7m
user#ubuntu:~/rc$
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl delete pod bigwebstuff-673k9 bigwebstuff-cs7i3 #### delete pods
pod "bigwebstuff-673k9" deleted
pod "bigwebstuff-cs7i3" deleted
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl get pod #### the deleted pods regenerated
NAME READY STATUS RESTARTS AGE
bigwebstuff-910m9 1/1 Running 0 6s
bigwebstuff-egbqd 1/1 Running 0 8m
bigwebstuff-fksf6 1/1 Running 0 6s
You see the deleted pods are regenerated, though I can’t find the replica set, as if a hidden replicate set exist somewhere.
The 3 pods are started from rc.yaml file as follows,
user#ubuntu:~/rc$ cat webrc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: bigwebstuff
labels:
name: bigwebstuff
spec:
replicas: 3
selector:
run: testweb
template:
metadata:
labels:
run: testweb
spec:
containers:
- name: podweb
image: nginx
ports:
- containerPort: 80
But it didn’t show up after I use the yams file to create the pods.
Any idea on how to find the hidden replica set? Or why the pods gets regenerated?
A "ReplicaSet" is not the same thing as a "ReplicationController" (although they are similar). The kubectl get rs command lists replica sets, whereas the manifest file in your question creates a replication controller. Instead, use the kubectl get rc command to list replication controllers (or alternatively, change your manifest file to create a ReplicaSet instead of a ReplicationController).
On the difference between ReplicaSets and ReplicationControllers, let me quote the documentation:
Replica Set is the next-generation Replication Controller. The only difference between a Replica Set and a Replication Controller right now is the selector support. Replica Set supports the new set-based selector requirements as described in the labels user guide whereas a Replication Controller only supports equality-based selector requirements.
Replica sets and replication controllers are not the same thing. Try the following:
kubectl get rc
And then delete accordingly.