AWS Kubernetes Persistent Volumes EFS - kubernetes

I have deployed EFS file system in AWS EKS cluster after the deployment my storage pod is up and running.
kubectl get pod -n storage
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-968445d79-g8wjr 1/1 Running 0 136m
When I'm trying deploy application pod is not not coming up its pending state 0/1 at the same time PVC is not bounded its pending state.
Here are the logs for after the actual application deployment.
I0610 13:26:11.875109 1 controller.go:987] provision "default/logs" class "efs": started
E0610 13:26:11.894816 1 controller.go:1004] provision "default/logs" class "efs": unexpected error getting claim reference: selfLink was empty, can't make reference
I'm using k8 version 1.20 could you please some one help me on this.

Kubernetes 1.20 stopped propagating selfLink.
There is a workaround available, but it does not always work.
After the lines
spec:
containers:
- command:
- kube-apiserver
add
- --feature-gates=RemoveSelfLink=false
then reapply API server configuration
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
This workaround will not work after version 1.20 (1.21 and up), as selfLink will be completely removed.
Another solution is to use newer NFS provisioner image:
gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0

Related

Failed pods of previous helm release are not removed automatically

I have an application Helm chart with two deployments:
app (2 pod replicas)
app-dep (1 pod replica)
app-dep has an init container that waits for the app pods (using its labels) to be ready:
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app"
I am using helm to deploy an application:
helm upgrade --install --wait --create-namespace --timeout 10m0s app ./app
Revision 1 of the release app is deployed:
helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
app default 1 2023-02-03 01:10:18.796554241 +1100 AEDT deployed app-0.1.0 1.0.0
Everything goes fine probably.
After some time, one of the app pods is evicted due to the low Memory available.
These are some lines from the pod's description details:
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 12m kubelet The node was low on resource: memory. Container app was using 2513780Ki, which exceeds its request of 0.
Normal Killing 12m kubelet Stopping container app
Warning ExceededGracePeriod 12m kubelet Container runtime did not kill the pod within specified grace period.
Later a new pod was added automatically to match the deployment's replica count too.
But the Failed pod still remains in the namespace.
Now comes the next helm upgrade. The pods of app for release revision 2 are ready.
But the init-container of app-dep of the latest revision remains to wait for all the pods with the label app.kubernetes.io/component=app" to become ready. After 10 minutes of timeout helm release 2 is declared as failed.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
app-7595488c8f-4v42n 1/1 Running 0 7m37s
app-7595488c8f-xt4qt 1/1 Running 0 6m17s
app-86448b6cd-7fq2w 0/1 Error 0 36m
app-dep-546d897d6c-q9sw6 1/1 Running 0 38m
app-dep-cd9cfd975-w2fzn 0/1 Init:0/1 0 7m37s
ANALYSIS FOR SOLUTION:
In order to address this issue, we can try two approaches:
Approach 1:
Find and remove all the failed pods of the previous revision first, just before doing a helm upgrade.
kubectl get pods --field-selector status.phase=Failed -n default
You can do it as part of the CD pipeline or add that task as a pre-install hook job to the helm chart too.
Approach 2:
Add one more label to the pods that change on every helm upgrade ( something like helm/release-revision=2 )
Add that label also in the init-container so that it waits for the pods that have both labels.
It will then ignore the Failed pods of the previous release that have a different label.
initContainers:
- name: wait-for-app-pods
image: groundnuty/k8s-wait-for:v1.5.1
imagePullPolicy: Always
args:
- "pod"
- "-l app.kubernetes.io/component=app, helm/release-revision=2"
This approach causes a frequent updation of pod labels and therefore recreates the pod each time. Also, it is better to update the pod labels only in the deployment because as per official Kubernetes documentation of Deployment resource:
It is generally discouraged to make label selector updates
Also, there is no need to add the revision label to the selector field in the service manifest.
QUESTION:
Which approach would be better practice?
What would be the caveats and benefits of each method?
Is there any other approach to fix this issue?

Azure AKS Redis deployments

I want to make a Redis instance for my namespaces. I use Azure AKS.I have default, dev, qa and stg namespaces.I already have deployed the Redis in default namespace but after that is impossible to make this for others.Then I tried to make another one namespace ( redis ) but the result was the same , it is still pending.
PS D:\Code\Infrastructure> kubectl -n redis get pods
NAME READY STATUS RESTARTS AGE
redis-0 0/1 Pending 0 34s
sentinel-0 0/1 Pending 0 12s
Here are the link to resources that I use: GITHUB
I deleted all existing pvc and after that I was created new storage class with name development then I referred storageClassName: development

Kubernetes pod failed to update

We have a Gitlab CI/CD to deploy pod via Kubernetes. However, the updated pod is always pending and the deleted pod is always stuck at terminating.
The controller and scheduler are both okay.
If I described the pending pod, it shows it is scheduled but nothing else.
This is the pending pod's logs:
$ kubectl logs -f robo-apis-dev-7b79ccf74b-nr9q2 -n xxx -f Error from
server (BadRequest): container "robo-apis-dev" in pod
"robo-apis-dev-7b79ccf74b-nr9q2" is waiting to start:
ContainerCreating
What could be the issue? Our Kubernetes cluster never had this issue before.
Okay, it turns out we used to have an NFS server as PVC. But we have moved to AWS EKS recently, thus cleaning the NFS servers. Maybe there are some resources from nodes that are still on the NFS server. Once we temporarily roll back the NFS server, the pods start to move to RUNNING state.
The issue was discussed here - Orphaned pod https://github.com/kubernetes/kubernetes/issues/60987

Istio Installation successful but not able to deploy POD

I have successfully installed Istio in k8 cluster.
Istio version is 1.9.1
Kubernetes CNI plugin used: Calico version 3.18 (Calico POD is up and running)
kubectl get pod -A
istio-system istio-egressgateway-bd477794-8rnr6 1/1 Running 0 124m
istio-system istio-ingressgateway-79df7c789f-fjwf8 1/1 Running 0 124m
istio-system istiod-6dc55bbdd-89mlv 1/1 Running 0 124
When I'm trying to deploy sample nginx app I am getting the error below:
failed calling webhook sidecar-injector.istio.io context deadline exceeded
Post "https://istiod.istio-system.svc:443/inject?timeout=30s":
context deadline exceeded
When I Disable automatic proxy sidecar injection the pod is getting deployed without any errors.
kubectl label namespace default istio-injection-
I am not sure how to fix this issue could you please some one help me on this?
In this case, adding hostNetwork:true under spec.template.spec to the istiod Deployment may help.
This seems to be a workaround when using Calico CNI for pod networking (see: failed calling webhook "sidecar-injector.istio.io)
As we can find in the Kubernetes Host namespaces documentation:
HostNetwork - Controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.

Kubernetes monitoring service heapster keeps restarting

I am running a kubernetes cluster using azure's container engine. I have an issue with one of the kubernetes services, the one that does resource monitoring heapster. The pod is relaunched every minute or something like that. I have tried removing the heapster deployment, replicaset and pods, and recreate the deployment. It goes back the the same behaviour instantly.
When I look at the resources with the heapster label it looks a little bit weird:
$ kubectl get deploy,rs,po -l k8s-app=heapster --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 17h
NAME DESIRED CURRENT READY AGE
rs/heapster-2708163903 1 1 1 17h
rs/heapster-867061013 0 0 0 17h
NAME READY STATUS RESTARTS AGE
po/heapster-2708163903-vvs1d 2/2 Running 0 0s
For some reason there is two replica sets. The one called rs/heapster-867061013 keeps reappearing even when I delete all of the resources and redeploy them. The above also shows that the pod just started, and this is the issue it keeps getting created then it runs for some seconds and a new one is created. I am new to running kubernetes so I am unsure which logfiles are relevant to this issue.
Logs from heapster container
heapster.go:72] /heapster source=kubernetes.summary_api:""
heapster.go:73] Heapster version v1.3.0
configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
configs.go:62] Using kubelet port 10255
heapster.go:196] Starting with Metric Sink
heapster.go:106] Starting heapster on port 8082
Logs from heapster-nanny container
pod_nanny.go:56] Invoked by [/pod_nanny --cpu=80m --extra-cpu=0.5m --memory=140Mi --extra-memory=4Mi --threshold=5 --deployment=heapster --container=heapster --poll-period=300000 --estimator=exponential]
pod_nanny.go:68] Watching namespace: kube-system, pod: heapster-2708163903-mqlsq, container: heapster.
pod_nanny.go:69] cpu: 80m, extra_cpu: 0.5m, memory: 140Mi, extra_memory: 4Mi, storage: MISSING, extra_storage: 0Gi
pod_nanny.go:110] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:146800640 scale:0} d:{Dec:<nil>} s:140Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0} d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
It is completely normal and important that the Deployment Controller keeps old ReplicaSet resources in order to do fast rollbacks.
A Deployment resource manages ReplicaSet resources. Your heapster Deployment is configured to run 1 pod - this means it will always try to create one ReplicaSet with 1 pod. In case you make an update to the Deployment (say, a new heapster version), then the Deployment resource creates a new ReplicaSet which will schedule a pod with the new version. At the same time, the old ReplicaSet resource sets its desired pods to 0, but the resource itself is still kept for easy rollbacks. As you can see, the old ReplicaSet rs/heapster-867061013 has 0 pods running. In case you make a rollback, the Deployment deploy/heapster will increase the number of pods in rs/heapster-867061013 to 1 and decrease the number in rs/heapster-2708163903 back to 0. You should also checkout the documentation about the Deployment Controller (in case you haven't done it yet).
Still, it seems odd to me why your newly created Deployment Controller would instantly create 2 ReplicaSets. Did you wait a few seconds (say, 20) after deleting the Deployment Controller and before creating a new one? For me it sometimes takes some time before deletions propagate throughout the whole cluster and if I recreate too quickly, then the same resource is reused.
Concerning the heapster pod recreation you mentioned: pods have a restartPolicy. If set to Never, the pod will be recreated by its ReplicaSet in case it exits (this means a new pod resource is created and the old one is being deleted). My guess is that your heapster pod has this Never policy set. It might exit due to some error and reach a Failed state (you need to check that with the logs). Then after a short while the ReplicaSet creates a new pod.
OK, so it happens to be a problem in the azure container service default kubernetes configuration. I got some help from an azure supporter.
The problem is fixed by adding the label addonmanager.kubernetes.io/mode: EnsureExists to the heapster deployment. Here is the pull request that the supporter referenced: https://github.com/Azure/acs-engine/pull/1133