Node has no available volume zone in AWS EKS - kubernetes

Trying to create pod but getting following error:
0/3 nodes are available: 1 node(s) had no available volume zone.
I tried to attach more volume but still the error is same.
Warning FailedScheduling 2s (x14 over 42s) default-scheduler 0/3 nodes are available: 1 node(s) had no available volume zone, 2 node(s) didn't have free ports for the requested pod ports.

My problem was that the AWS EC2 Volume and Kubernetes PersistentVolume (PV) state got somehow out of sync / corrupted. Kubernetes believed there was a bound PV while the EC2 Volume showed as "available", not mounted to a worker node.
Update: The volume was in a different avail. zone then either of the 2 EC2 nodes and thus could not be attached to them.
The solution was to delete all relevant resources - StatefulSet, PVC (crucial!), PV. Then I was able to apply them again and Kubernetes succeeded in creating a new EC2 Volume and attaching it to the instance.
As you can see in my configuration, I have a StatefulSet with a "volumeClaimTemplate" (=> PersistentVolumeClaim, PVC) (and a matching StorageClass definition) so Kubernetes should dynamically provision an EC2 Volume, attach it to a worker and expose it as a PersistentVolume.
See kubectl get pvc, kubectl get pv and in the AWS Console - EC2 - Volumes.
NOTE: "Bound" = the PV is bound to the PVC.
Here is a description of a laborious way to restore a StatefulSet on AWS if you have a snapshot of the EBS volume (5/2018): https://medium.com/#joatmon08/kubernetes-statefulset-recovery-from-aws-snapshots-8a6159cda6f1

Related

Kubernetes with Cluster Autoscaler & Statefulset: node(s) had volume node affinity conflict

I have an Kubernetes cluster on AWS with Cluster Autoscaler (a component to automically adjusts the desired number of nodes based on usage). The cluster previously had node A on AZ-1 and node B on AZ-2. When I deploy my statefulset with dynamic PVC, the PVC and PV are created on AZ-2, and the pods are created on node B.
I deleted the statefulset to perform some testing. The Cluster Autoscaler decides that one node is now enough and adjusted the desired number down to 1. Now that node B is deleted, when I redeploy my statefulset, the pods are in pending state and can't be created on node A with the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m8s (x997 over 18h) default-scheduler 0/1 nodes are available: 1 node(s) had volume node affinity conflict.
Normal NotTriggerScaleUp 95s (x6511 over 18h) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had volume node affinity conflict
I know it is because the PVs are created in AZ-2 and can't be attached to pods in AZ-1, but how do I overcome this issue?
Use multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the --balance-similar-node-groups feature.
https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html
Important
If you are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes Cluster Autoscaler, you should configure multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the --balance-similar-node-groups feature.

Unable to launch new pods despite resources seemingly available

I'm unable to launch new pods despite resources seemingly being available.
Judging from the below screenshot there should be room for about 40 new pods.
And also judging from the following screenshot the nodes seems fairly underutilized
However I'm currently facing the below error message
0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had volume node affinity conflict.
And last night it was the following
0/3 nodes are available: 1 Too many pods, 2 node(s) had volume node affinity conflict.
Most of my services require very little memory and cpu. And therefore their resources are configured as seen below
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 100m
memory: 32Mi
Why I can't deploy more pods? And how to fix this?
Your problem is "volume node affinity conflict".
From Kubernetes Pod Warning: 1 node(s) had volume node affinity conflict:
The error "volume node affinity conflict" happens when the persistent volume claims that the pod is using are scheduled on different zones, rather than on one zone, and so the actual pod was not able to be scheduled because it cannot connect to the volume from another zone.
First, try to investigate exactly where the problem is. You can find a detailed guide here. You will need commands like:
kubectl get pv
kubectl describe pv
kubectl get pvc
kubectl describe pvc
Then you can delete the PV and PVC and move pods to the same zone along with the PV and PVC.
volume node affinity conflict - the volume you tried to mount is not available on any of the node. You can resolve this or paste your volumes section to the question for further examination.

PVC behavior with Dynamic Provision and Replicaset

I have a question please, for PVC that is bound to one PV through dynamic storageclass on a pod created by a replica set , if that pod gets terminated and restarted on another host will it get the same PV?
what i saw that the Pod could not be rescheduled till the same PV was active but i am not able to understand what should be the standard behavior and how PVC should react differently between replica set and statefulset
another host mean another Kubernetes node?
If POD gets restarted or terminated and scheduled again on another node in that case if PVC and PV exist disk will be mounted to that specific node and POD again starts running. Yes here, PVC and PV will be the same but still depends on Retain policy.
You can read more about : https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#deployments_vs_statefulsets
PersistentVolumes can have various reclaim policies, including
"Retain", "Recycle", and "Delete". For dynamically provisioned
PersistentVolumes, the default reclaim policy is "Delete". This means
that a dynamically provisioned volume is automatically deleted when a
user deletes the corresponding PersistentVolumeClaim.
read more at : https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/
if your pod gets terminated or restarted means you are not deleting the PVC in that case PV will be there POD will again get attach to PVC and start running on the respective node.

0/2 nodes are available....2 pod has unbound immediate PersistentVolumeClaims

I installed thanos using bitnami's helm chart.
After installing prometheus with Helm Chart
Likewise, minio was deployed together while installing thanos with Helm Chart.
The deployed minio pod is in Pending state
0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims.
Problems arise.
kubectl get pvc check result is Pending status
no persistent volumes available for this claim and no storage class is set
This happens.
How to build pv in this situation...?
Basically you used a pvc in the pod where that pvc is not bound to any pv. You can either do dynamic provisioning (at that time, pvc will create a pv for you as far the requirements), or you can manually create a pv.
For dynamic provisioning can see this doc.
For pv doc

How to identify pod eviction policy?

I have a Kubernetes cluster deployed on GCP with a single node, 4 CPU's and 15GB memory. There are a few pods with all the pods bound to the persistent volume by a persistent volume claim. I have observed that the pods have restarted automatically and the data in the persistent volume is lost.
After some research, I suspect that this could be because of the pod eviction policy. When I used kubectl describe pod , I noticed the below error.
0/1 nodes are available: 1 node(s) were not ready, 1 node(s) were out of disk space, 1 node(s) were unschedulable.
The restart policy of my pods is "always". So I think that the pods have restarted after being resource deprived.
How do I identify the pod eviction policy of my cluster and change it? so that this does not happen in the future
pod eviction policy of my cluster and change
These thresholds ( pod eviction) are flags of kubelet, you can tune these values according to your requirement. you can edit the kubelet config file, here is the detail config-file
Dynamic Kubelet Configuration allows you to edit these values in the live cluster
The restart policy of my pods is "always". So I think that the pods have restarted after being resource deprived.
Your pod has been rescheduled due to node's issue (not enough disk space ).
The restart policy of my pods is "always".
It means if the pod is not up and running then try to restart it .