Kubernetes - All PVCs Bound, yet "pod has unbound immediate PersistentVolumeClaims" - kubernetes

Unfortunately I am unable to paste configs or kubectl output, but please bear with me.
Using helm to deploy a series of containers to K8s 1.14.6, all containers are deploying successfully except for those that have initContainer sections defined within them.
In these failing deployments, their templates define container and initContainer stanzas that reference the same persistent-volume (and associated persistent-volume-claim, both defined elsewhere).
The purpose of the initContainer is to copy persisted files from a mounted drive location into the appropriate place before the main container is established.
Other containers (without initContainer stanzas) mount properly and run as expected.
These pods which have initContainer stanzas, however, report "failed to initialize" or "CrashLoopBackOff" as they continually try to start up. The kubectl describe pod of these pods gives only a Warning in the events section that "pod has unbound immediate PersistentVolumeClaims." The initContainer section of the pod description says it has failed because "Error" with no further elaboration.
When looking at the associated pv and pvc entries from kubectl, however, none are left pending, and all report "Bound" with no Events to speak of in the description.
I have been able to find plenty of articles suggesting fixes when your pvc list shows Pending claims, yet none so far that address this particular set of circumstance when all pvcs are bound.

When a PVC is "Bound", this means that you do have a PersistentVolume object in your cluster, whose claimRef refers to that PVC (and usually: that your storage provisioner is done creating the corresponding volume in your storage backend).
When a volume is "not bound", in one of your Pod, this means the node where your Pod was scheduled is unable to attach your persistent volume. If you're sure there's no mistake in your Pods volumes, you should then check logs for your csi volumes attacher pod, when using CSI, or directly in nodes logs when using some in-tree driver.
While the crashLoopBackOff thing is something else. You should check for logs of your initContainer: kubectl logs -c <init-container-name> -p. From your explanation, I would suppose there's some permission issues when copying files over.

Related

NFS mount fails during kubernetes pod initialization if NFS client is set on container level, not on pod level

Setup:
Linux VM where Pod (containing 3 containers) is started.
Only 1 of the containers needs the NFS mount to the remote NFS server.
This "app" container is based Alpine linux.
Remote NFS server is up & running. If I create a separate yaml file for persistent volume with that server info - it's up & available.
In my pod yaml file I define Persistent Volume (with that remote NFS server info), Persistent Volume Claim and associate my "app" container's volume with that claim.
Everything works as a charm if on the hosting linux VM I install the NFS library, like:
sudo apt install nfs-common.
(That's why I don't share my kubernetes yaml file. Looks like problem is not there.)
But that's a development environment. I'm not sure how/where those containers would be used in production. For example they would be used in AWS EKS.
I hoped to install something like
apk add --no-cache nfs-utils in the "app" container's Dockerfile.
I.e. on container level, not on a pod level - could it work?
So far getting the pod initialization error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 22s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 20s default-scheduler Successfully assigned default/delphix-masking-0 to masking-kubernetes
Warning FailedMount 4s (x6 over 20s) kubelet MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=4.1 maxTestNfs1.dlpxdc.co:/var/tmp/masking-mount /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv
Output: mount: /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
And the process is stuck in that step forever.
Looks like it happens before even trying to initialize containers.
So I wonder if approach of enabling NFS-client on the container's level is valid.
Thanks in ahead for any insights!
I hoped to install something like apk add --no-cache nfs-utils in the "app" container's Dockerfile. I.e. on container level, not on a pod level - could it work?
Yes, this could work. This is normally what you would do if you have no control to the node (eg. you can't be sure if the host is ready for NFS calls). You need to ensure your pod can reach out to the NFS server and in between all required ports are opened. You also needs to ensure required NFS program (eg. rpcbind) is started before your own program in the container.
...For example they would be used in AWS EKS.
EKS optimized AMI come with NFS supports, you can leverage K8S PV/PVC support using this image for your worker node, there's no need to initialize NFS client support in your container.
TL;DR - no, it is not best practice (and not the right way) to mount NFS volumes from inside the container. There is no use-case for it. And it is a huge security risk as well (allowing applications direct access to cluster-wide storage without any security control).
It appears your objective is to provide storage to your container that is backed by NFS, right? Then doing it by creating a PersistentVolume and then using a PersistentVolumeClaim to attach it to your Pod is the correct approach. That you're already doing.
No, you don't have to worry about how will the storage be provided to the container that is because due to the way k8s runs applications, certain conditions MUST be met before a Pod can be scheduled on a Node. One of those conditions is, the volumes the pod is mounting MUST be available. If a volume doesnt exist, and you mount it in a Pod, that Pod will never get scheduled and will possibly be stuck in Pending state. That's what you see as the error as well.
You don't have to worry about the NFS connectivity, because in this case, the Kubernetes PersistentVolume resource you created will technically act like an NFS client for your NFS server. This provides a uniform storage interface (applications don't have to care where the volume is coming from, the application code will be independent of storage type) as well as better security and permission control.
Another note, when dealing with Kubernetes, it is recommended to consider Pod as your 'smallest unit' of infrastructure and not container. So, it is the recommended way to only use 1 application container per pod for simplicity of design and achieving the micro-service architecture in true sense.

When emptyDir deleted for a completed pod of a job in kubernetes

Per Doc for Kubernetes Volumes, "(emptyDir) exists as long as that Pod is running on that node". So when actually is emptyDir deleted for a "Completed" pod of a job?
I thought it's deleted when the pod is removed (kubectl get pod can no longer find the pod). But I'm wrong. Experiments show that an emptyDir is deleted as soon as the pod is completed.
Has anybody noticed this? Or is this version-dependent?
Yes, and it is strongly tied with Pod. This is not version-dependent.
emptyDir: emptyDir volume’s lifetime is tied to that of the pod, the volume’s contents are lost when the pod is deleted.
Use cases: An emptyDir volume is especially useful for sharing files between containers running in the same pod. But it can also be used by a single container for when a container needs to write data to disk temporarily, such as when performing a sort operation on a large dataset.

Kubernetes Persistent Volume Claim FileSystemResizePending

i have a persistent volume claim for a kubernetes pod which shows the message "Waiting for user to (re-)start a pod to finish file system resize of volume on node." if i check it with 'kubectl describe pvc ...'
The rezising itself worked which was done with terraform in our deployments but this message still shows up here and i'm not really sure how to get this fixed? The pod was already restarted several times - i tried kubectl delete pod and scale it down with kubectl scale deployment.
Does anyone have an idea how to get rid of this message?screenshot
There are few things to consider:
Instead of using the Terraform, try resizing the PVC by editing it manually. After that wait for the underlying volume to be expanded by the storage provider and verify if the FileSystemResizePending condition is present by executing kubectl get pvc <pvc_name> -o yaml. Than, make sure that all the associated pods are restarted so the whole process can be completed. Once file system resizing is done, the PVC will automatically be updated to reflect new size.
Make sure that your volume type is supported for expansion. You can expand the following types of volumes:
gcePersistentDisk
awsElasticBlockStore
Cinder
glusterfs
rbd
Azure File
Azure Disk
Portworx
FlexVolumes
CSI
Check if in your StorageClass the allowVolumeExpansion field is set to true.

kubectl get pod status always ContainerCreating

k8s version: 1.12.1
I created pod with api on node and allocated an IP (through flanneld). When I used the kubectl describe pod command, I could not get the pod IP, and there was no such IP in etcd storage.
It was only a few minutes later that the IP could be obtained, and then kubectl get pod STATUS was Running.
Has anyone ever encountered this problem?
Like MatthiasSommer mentioned in comment, process of creating pod might take a while.
If POD will stay for a longer time in ContainerCreating status you can check what is stopping it change to status Running by command:
kubectl describe pod <pod_name>
Why creating of pod may take a longer time?
Depends on what is included in manifest, pod can share namespace, storage volumes, secrets, assignin resources, configmaps etc.
kube-apiserver validates and configures data for api objects.
kube-scheduler needs to check and collect resurces requrements, constraints, etc and assign pod to the node.
kubelet is running on each node and is ensures that all containers fulfill pod specification and are healty.
kube-proxy is also running on each node and it is responsible for network on pod.
As you see there are many requests, validates, syncs and it need a while to create pod fulfill all requirements.

How to attach OpenStack volume to a Kubernetes staic pod?

Suppose I bootstrap a single master node with kubelet v1.10.3 in OpenStack cloud and I would like to have a "self-hosted" single etcd node for k8s necessities as a pod.
Before starting kube-apiserver component you need a working etcd instance, but of course you can't just perform kubectl apply -f or put a manifest to addon-manager folder because cluster is not ready at all.
There is a way to start pods by kubelet without having a ready apiserver. It is called static pods (yaml Pod definitions usually located at /etc/kubernetes/manifests/). And it is the way I start "system" pods like apiserver, scheduler, controller-manager and etcd itself. Previously I just mounted a directory from node to persist etcd data, but now I would like to use OpenStack blockstorage resource. And here is the question: how can I attach, mount and use OpenStack cinder volume to persist etcd data from static pod?
As I learned today there are at least 3 ways to attach OpenStack volumes:
CSI OpenStack cinder driver which is pretty much new way of managing volumes. And it won't fit my requirements, because in static pods manifests I can only declare Pods and not other resources like PVC/PV while CSI docs say:
The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
before-csi way to attach volumes is: FlexVolume.
FlexVolume driver binaries must be installed in a pre-defined volume plugin path on each node (and in some cases master).
Ok, I added those binaries to my node (using this DS as a reference), added volume to pod manifest like this:
volumes:
- name: test
flexVolume:
driver: "cinder.io/cinder-flex-volume-driver"
fsType: "ext4"
options:
volumeID: "$VOLUME_ID"
cinderConfig: "/etc/kubernetes/cloud-config"
and got the following error from kubelet logs:
driver-call.go:258] mount command failed, status: Failure, reason: Volume 2c21311b-7329-4cf4-8230-f3ce2f23cf1a is not available
which is weird because I am sure this Cinder volume is already attached to my CoreOS compute instance.
and the last way to mount volumes I know is cinder in-tree support which should work since at least k8s 1.5 and does not have any special requirements besides --cloud-provider=openstack and --cloud-config kubelet options.
The yaml manifest part for declaring volume for static pod looks like this:
volumes:
- name: html-volume
cinder:
# Enter the volume ID below
volumeID: "$VOLUME_ID"
fsType: ext4
Unfortunately when I try this method I get the following error from kubelet:
Volume has not been added to the list of VolumesInUse in the node's volume status for volume.
Do not know what it means but sounds like the node status could not be updated (of course, there is no etcd and apiserver yet). Sad, it was the most promising option for me.
Are there any other ways to attach OpenStack cinder volume to a static pod relying on kubelet only (when cluster is actually not ready)? Any ideas on what cloud I miss of got above errors?
Message Volume has not been added to the list of VolumesInUse in the node's volume status for volume. says that attach/detach operations for that node are delegated to controller-manager only. Kubelet waits for attachment being made by controller but volume doesn't reach appropriate state because controller isn't up yet.
The solution is to set kubelet flag --enable-controller-attach-detach=false to let kubelet attach, mount and so on. This flag is set to true by default because of the following reasons
If a node is lost, volumes that were attached to it can be detached
by the controller and reattached elsewhere.
Credentials for attaching and detaching do not need to be made
present on every node, improving security.
In your case setting of this flag to false is reasonable as this is the only way to achieve what you want.