Failed pods of previous helm release are not removed automatically - kubernetes

I have an application Helm chart with two deployments:
app (2 pod replicas)
app-dep (1 pod replica)
app-dep has an init container that waits for the app pods (using its labels) to be ready:
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app"
I am using helm to deploy an application:
helm upgrade --install --wait --create-namespace --timeout 10m0s app ./app
Revision 1 of the release app is deployed:
helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
app default 1 2023-02-03 01:10:18.796554241 +1100 AEDT deployed app-0.1.0 1.0.0
Everything goes fine probably.
After some time, one of the app pods is evicted due to the low Memory available.
These are some lines from the pod's description details:
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 12m kubelet The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Normal Killing 12m kubelet Stopping container app
Warning ExceededGracePeriod 12m kubelet Container runtime did not kill the pod within specified grace period.
Later a new pod was added automatically to match the deployment's replica count too.
But the Failed pod still remains in the namespace.
Now comes the next helm upgrade. The pods of app for release revision 2 are ready.
But the init-container of app-dep of the latest revision remains to wait for all the pods with the label app.kubernetes.io/component=app" to become ready. After 10 minutes of timeout helm release 2 is declared as failed.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
app-7595488c8f-4v42n 1/1 Running 0 7m37s
app-7595488c8f-xt4qt 1/1 Running 0 6m17s
app-86448b6cd-7fq2w 0/1 Error 0 36m
app-dep-546d897d6c-q9sw6 1/1 Running 0 38m
app-dep-cd9cfd975-w2fzn 0/1 Init:0/1 0 7m37s
ANALYSIS FOR SOLUTION:
In order to address this issue, we can try two approaches:
Approach 1:
Find and remove all the failed pods of the previous revision first, just before doing a helm upgrade.
kubectl get pods --field-selector status.phase=Failed -n default
You can do it as part of the CD pipeline or add that task as a pre-install hook job to the helm chart too.
Approach 2:
Add one more label to the pods that change on every helm upgrade ( something like helm/release-revision=2 )
Add that label also in the init-container so that it waits for the pods that have both labels.
It will then ignore the Failed pods of the previous release that have a different label.
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app, helm/release-revision=2"
This approach causes a frequent updation of pod labels and therefore recreates the pod each time. Also, it is better to update the pod labels only in the deployment because as per official Kubernetes documentation of Deployment resource:
It is generally discouraged to make label selector updates
Also, there is no need to add the revision label to the selector field in the service manifest.
QUESTION:
Which approach would be better practice?
What would be the caveats and benefits of each method?
Is there any other approach to fix this issue?

Related

AWS Kubernetes Persistent Volumes EFS

I have deployed EFS file system in AWS EKS cluster after the deployment my storage pod is up and running.
kubectl get pod -n storage
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-968445d79-g8wjr 1/1 Running 0 136m
When I'm trying deploy application pod is not not coming up its pending state 0/1 at the same time PVC is not bounded its pending state.
Here are the logs for after the actual application deployment.
I0610 13:26:11.875109 1 controller.go:987] provision "default/logs" class "efs": started
E0610 13:26:11.894816 1 controller.go:1004] provision "default/logs" class "efs": unexpected error getting claim reference: selfLink was empty, can't make reference
I'm using k8 version 1.20 could you please some one help me on this.
Kubernetes 1.20 stopped propagating selfLink.
There is a workaround available, but it does not always work.
After the lines
spec:
containers:
- command:
- kube-apiserver
add
- --feature-gates=RemoveSelfLink=false
then reapply API server configuration
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
This workaround will not work after version 1.20 (1.21 and up), as selfLink will be completely removed.
Another solution is to use newer NFS provisioner image:
gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0

kubernates replace deployment sometimes doesn't work, rollout error

We have a kubernetes deployment and on version release we build a new image and run:
kubectl replace -f deployment.yml
This is always the same deployment file except for the name of the image it uses.
It usually works, but every once in a while it doesn't update the pod,
Running describe on the deployment reveals there's no new replica set:
OldReplicaSets: appname-675551529 (1/1 replicas created)
NewReplicaSet: <none>
and viewing rollout status Returns the following:
Waiting for deployment spec update to be observed...
Waiting for deployment spec update to be observed...
error: watch closed before Until timeout

Kubernetes monitoring service heapster keeps restarting

I am running a kubernetes cluster using azure's container engine. I have an issue with one of the kubernetes services, the one that does resource monitoring heapster. The pod is relaunched every minute or something like that. I have tried removing the heapster deployment, replicaset and pods, and recreate the deployment. It goes back the the same behaviour instantly.
When I look at the resources with the heapster label it looks a little bit weird:
$ kubectl get deploy,rs,po -l k8s-app=heapster --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 17h
NAME DESIRED CURRENT READY AGE
rs/heapster-2708163903 1 1 1 17h
rs/heapster-867061013 0 0 0 17h
NAME READY STATUS RESTARTS AGE
po/heapster-2708163903-vvs1d 2/2 Running 0 0s
For some reason there is two replica sets. The one called rs/heapster-867061013 keeps reappearing even when I delete all of the resources and redeploy them. The above also shows that the pod just started, and this is the issue it keeps getting created then it runs for some seconds and a new one is created. I am new to running kubernetes so I am unsure which logfiles are relevant to this issue.
Logs from heapster container
heapster.go:72] /heapster source=kubernetes.summary_api:""
heapster.go:73] Heapster version v1.3.0
configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
configs.go:62] Using kubelet port 10255
heapster.go:196] Starting with Metric Sink
heapster.go:106] Starting heapster on port 8082
Logs from heapster-nanny container
pod_nanny.go:56] Invoked by [/pod_nanny --cpu=80m --extra-cpu=0.5m --memory=140Mi --extra-memory=4Mi --threshold=5 --deployment=heapster --container=heapster --poll-period=300000 --estimator=exponential]
pod_nanny.go:68] Watching namespace: kube-system, pod: heapster-2708163903-mqlsq, container: heapster.
pod_nanny.go:69] cpu: 80m, extra_cpu: 0.5m, memory: 140Mi, extra_memory: 4Mi, storage: MISSING, extra_storage: 0Gi
pod_nanny.go:110] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:146800640 scale:0} d:{Dec:<nil>} s:140Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0} d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
It is completely normal and important that the Deployment Controller keeps old ReplicaSet resources in order to do fast rollbacks.
A Deployment resource manages ReplicaSet resources. Your heapster Deployment is configured to run 1 pod - this means it will always try to create one ReplicaSet with 1 pod. In case you make an update to the Deployment (say, a new heapster version), then the Deployment resource creates a new ReplicaSet which will schedule a pod with the new version. At the same time, the old ReplicaSet resource sets its desired pods to 0, but the resource itself is still kept for easy rollbacks. As you can see, the old ReplicaSet rs/heapster-867061013 has 0 pods running. In case you make a rollback, the Deployment deploy/heapster will increase the number of pods in rs/heapster-867061013 to 1 and decrease the number in rs/heapster-2708163903 back to 0. You should also checkout the documentation about the Deployment Controller (in case you haven't done it yet).
Still, it seems odd to me why your newly created Deployment Controller would instantly create 2 ReplicaSets. Did you wait a few seconds (say, 20) after deleting the Deployment Controller and before creating a new one? For me it sometimes takes some time before deletions propagate throughout the whole cluster and if I recreate too quickly, then the same resource is reused.
Concerning the heapster pod recreation you mentioned: pods have a restartPolicy. If set to Never, the pod will be recreated by its ReplicaSet in case it exits (this means a new pod resource is created and the old one is being deleted). My guess is that your heapster pod has this Never policy set. It might exit due to some error and reach a Failed state (you need to check that with the logs). Then after a short while the ReplicaSet creates a new pod.
OK, so it happens to be a problem in the azure container service default kubernetes configuration. I got some help from an azure supporter.
The problem is fixed by adding the label addonmanager.kubernetes.io/mode: EnsureExists to the heapster deployment. Here is the pull request that the supporter referenced: https://github.com/Azure/acs-engine/pull/1133

How do I check if a Kubernetes pod was killed for OOM or DEADLINE EXCEEDED?

I have some previously run pods that I think were killed by Kubernetes for OOM or DEADLINE EXCEEDED, what's the most reliable way to confirm that? Especially if the pods weren't recent.
If the pods are still showing up when you type kubectl get pods -a then you get type the following kubectl describe pod PODNAME and look at the reason for termination. The output will look similar to the following (I have extracted parts of the output that are relevant to this discussion):
Containers:
somename:
Container ID: docker://5f0d9e4c8e0510189f5f209cb09de27b7b114032cc94db0130a9edca59560c11
Image: ubuntu:latest
...
State: Terminated
Reason: Completed
Exit Code: 0
In the sample output you will, my pod's terminated reason is Completed but you will see other reasons such as OOMKilled and others over there.
If the pod has already been deleted, you can also check kubernetes events and see what's going on:
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
59m 59m 1 my-pod-7477dc76c5-p49k4 Pod spec.containers{my-service} Normal Killing kubelet Killing container with id docker://my-service:Need to kill Pod

A pending status after command "kubectl create -f busybox.yaml"

I have a picture below of my mac.
K8S Cluster(on VirtualBox, 1*master, 2*workers)
OS Ubuntu 15.04
K8S version 1.1.1
When I try to create a pod "busybox.yaml" it goes to pending status.
How can I resolve it?
I pasted the online status below for understanding with a picture (kubectl describe node).
Status
kubectl get nodes
192.168.56.11 kubernetes.io/hostname=192.168.56.11 Ready 7d
192.168.56.12 kubernetes.io/hostname=192.168.56.12 Ready 7d
kubectl get ev
1h 39s 217 busybox Pod FailedScheduling {scheduler } no nodes available to schedule pods
kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 0/1 Pending 0 1h
And I also added one more status.
"kubectl describe pod busybox" or "kubectl get pod busybox -o yaml" output could be useful.
Since you didn't specify, I assume that the busybox pod was created in the default namespace, and that no resource requirements nor nodeSelectors were specified.
In many cluster setups, including vagrant, we create a LimitRange for the default namespace to request a nominal amount of CPU for each pod (.1 cores). You should be able to confirm that this is the case using "kubectl get pod busybox -o yaml".
We also create a number of system pods automatically. You should be able to see them using "kubectl get pods --all-namespaces -o wide".
It is possible for nodes with sufficiently small capacity to fill up with just system pods, though I wouldn't expect this to happen with 2-core nodes.
If the busybox pod were created before the nodes were registered, that could be another reason for that event, though I would expect to see a subsequent event for the reason that the pod remained pending even after nodes were created.
Please take a look at the troubleshooting guide for more troubleshooting tips, and follow up here on on slack (slack.k8s.io) with more information.
http://kubernetes.io/v1.1/docs/troubleshooting.html