K8s - an NFS share is only mounted read-only in the pod - kubernetes

Environment: external NFS share for persistent storage, accessible to all, R/W, Centos7 VMs (NFS share and K8s cluster), NFS utils installed on all workers.
Mount on a VM, e.g. a K8s worker node, works correctly, the share is R/W
Deployed in the K8s cluster: PV, PVC, Deployment (Volumes - referenced to PVC, VolumeMount)
The structure of the YAML files corresponds to the various instructions and postings, including the postings here on the site.
The pod starts, the share is mounted. Unfortunately, it is read-only. All the suggestions from the postings I have found about this did not work so far.
Any idea what else I could look out for, what else I could try?
Thanks. Thomas

After digging deep, I found the cause of the problem. Apparently, the syntax for the NFS export is very sensitive. One more space can be problematic.
On the NFS server, two export entries were stored in the kernel tables. The first R/O and the second R/W. I don't know whether this is a Centos bug because of the syntax in /etc/exports.
On another Centos machine I was able to mount the share without any problems (r/w). In the container (Debian-based image), however, not (only r/o). I have not investigated whether this is due to Kubernetes or Debian behaves differently.
After correcting the /etc/exports file and restarting the NFS server, there was only one correct entry in the kernel table. After that, mounting R/W worked on a Centos machine as well as in the Debian-based container inside K8s.
Here are the files / table:
privious /etc/exports:
/nfsshare 172.16.6.* (rw,sync,no_root_squash,no_all_squash,no_acl)
==> kernel:
/nfsshare 172.16.6.*(ro, ...
/nfsshare *(rw, ...
corrected /etc/exports (w/o the blank):
/nfsshare *(rw,sync,no_root_squash,no_all_squash,no_acl)

In principle, the idea of using an init container is a good one. Thank you for reminding me of this.
I have tried it.
Unfortunately, it doesn't change the basic problem. The file system is mounted "read-only" by Kubernetes. The init container returns the following error message (from the log):
chmod: /var/opt/dilax/enumeris: Read-only file system

Related

Kubernetes: fsGroup has different impact on hostPath versus pvc and different impact on nfs versus cifs

Many of my workflows use pod iam roles. As documented here, I must include fsGroup in order for non-root containers to read the generated identity token. The problem with this is when I additionally include pvc’s that point to cifs pv’s, the volumes fail to mount because they time out. Seemingly this is because Kubelet tries to chown all of the files on the volume, which takes too much time and causes the timeout. Questions…
Why doesnt Kubernetes try to chown all of the files when hostPath is used instead of a pvc? All of the workflows were fine until I made the switch to use pvcs from hostPath, and now the timeout issue happens.
Why does this problem happen on cifs pvcs but not nfs pvcs? I have noticed that nfs pvcs continue to mount just fine and the fsGroup seemingly doesn’t take effect as I don’t see the group id change on any of the files. However, the cifs pvcs can no longer be mounted seemingly due to the timeout issue. If it matters, I am using the native nfs pv lego and this cifs flexVolume plugin that has worked great up until now.
Overall, the goal of this post is to better understand how Kubernetes determines when to chown all of the files on a volume when fsGroup is included in order to make a good design decision going forward. Thanks for any help you can provide!
Kubernetes Chowning Files References
https://learn.microsoft.com/en-us/azure/aks/troubleshooting
Since gid and uid are mounted as root or 0 by default. If gid or uid
are set as non-root, for example 1000, Kubernetes will use chown to
change all directories and files under that disk. This operation can
be time consuming and may make mounting the disk very slow.
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
By default, Kubernetes recursively changes ownership and permissions
for the contents of each volume to match the fsGroup specified in a
Pod's securityContext when that volume is mounted. For large volumes,
checking and changing ownership and permissions can take a lot of
time, slowing Pod startup.
I posted this question on the Kubernetes Repo a while ago and it was recently answered in the comments.
The gist is fsgroup support is implemented and decided on per plugin. They ignore it for nfs, which is why I have never seen Kubelet chown files on nfs pvcs. For FlexVolume plugins, a plugin can opt-out of fsGroup based permission changes by returning FSGroup false. So, that is why Kubelet was trying to chown the cifs pvcs -- the FlexVolume plugin I am using does not return fsGroup false.
So, in the end you don't need to worry about this for nfs, and if you are using a FlexVolume plugin for a shared file system, you should make sure it returns fsGroup false if you don't want Kubelet to chown all of the files.

Ubuntu 20.04 qemu-kvm sharing folders

I am using minikube on ubuntu and start with --driver=none. Since it is recommended to use kvm2(for linux host) and need to be root user with --driver=none, I am now switching to kvm2.
I have few kubernetes deployment using persistent volume and claims.
When minikube started, it figures out to mount the host path to this volumes and running as expected.
Now in kvm2, these mounting is not working as it cannot find the host path.
Is there any way to mount the host path to all the deployments automatically?
The idea is, to have central location say /data, as central persistent storage. Each deployment will have its own subfolder under /data. In the persistent storage, I will define the mountpath to use this subfolder. Is there any way to mount this host's /data folder to the pods automatically through configuration as one time setup?
If there is any better way to achieve the persistent storage, please let me know.

How to mimic Docker ability to pre-populate a volume from a container directory with Kubernetes

I am migrating my previous deployment made with docker-compose to Kubernetes.
In my previous deployment, some containers do have some data made at build time in some paths and these paths are mounted in persistent volumes.
Therefore, as the Docker volume documentation states,the persistent volume (not a bind mount) will be pre-populated with the container directory content.
I'd like to achieve this behavior with Kubernetes and its persistent volumes, How can I do ? Do I need to add some kind of logic using scripts in order to copy my container's files to the mounted path when data is not present the first time the container starts ?
Possibly related question: Kubernetes mount volume on existing directory with files inside the container
I think your options are
ConfigMap (are "some data" configuration files?)
Init containers (as mentioned)
CSI Volume Cloning (clone combining an init or your first app container)
there used to be a gitRepo; deprecated in favour of init containers where you can clone your config and data from
HostPath volume mount is an option too
An NFS volume is probably a very reasonable option and similar from an approach point of view to your Docker Volumes
Storage type: NFS, iscsi, awsElasticBlockStore, gcePersistentDisk and others can be pre-populated. There are constraints. NFS probably the most flexible for sharing bits & bytes.
FYI
The subPath might be of interest too depending on your use case and
PodPreset might help in streamlining the op across the fleet of your pods
HTH

Trying to mount a Lustre filesystem in a pod: tries and problems

For our data science platform we have user directories on a Lustre filesystem (HPC world) that we want to present inside Jupyter notebooks running on OpenShift (they are spawned on demand by JupyterHub after authentication and other custom actions).
We tried the following approaches:
Mounting the Lustre filesystem in a privileged sidecar container in the same pod as the notebook, sharing the mount through an EmptyDir volume (that way the notebook container does not need to be privileged). We have mount/umount of Lustre occur in the sidecar as post-start and pre-stop in the pod lifecycle. Problem is that sometimes when we delete the pod umount does not work or hang for whatever reason, and as EmptyDirs are "destroyed" it flushes everything in Lustre because the fs is still mounted. A real bad result...
Mounting the Lustre filesystem directly on the node and create a hostPath volume in the pod and a volumeMount in the notebook container. It kind of works, but only if the container is run as privileged, which we don't want of course.
We tried more specific sccs which authorize the hostPath volume (hostaccess, hostmount-anyuid), and also made a custom one with hostPath volume and 'allowHostDirVolumePlugin: true', but with no success. We can see the mount from the container, but even with everything wide open (777), we get a "permission denied" with whatever we try to do on the mount(ls, touch,...). Again, it works only if the container is privileged. Does not seem to be SELinux related, at least we have no alerts.
Does anyone see where the problem is, or has another suggestion of solution we can try?

Where to store files in GKE container?

I'm having trouble understanding where to store files in a GKE container? I've seen the following documentation of the filesystem layout:
https://cloud.google.com/kubernetes-engine/docs/concepts/node-images#file_system_layout
But then there are also Dockerfile examples on the web that copy executable files to other paths not listed in the layout, such as /usr or /go. One of these examples is here:
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/master/hello-app/Dockerfile
Another question is: If I have runtime code that needs to download certain configuration information after the container starts, can I write the configuration file to the same directory as my executable? Or do I have to choose /etc or /tmp.
And finally, the layout documentation states that /home and /var store data for the the lifetime of the boot disk? What does that mean? How does that compare to the lifetime of the pod or the node?
When you want to store something in a container you can either store something ephemeral or permanent
To store ephemeral way just choose a path /tmp, /var, /opt etc (this depends on the container set up as well), once the container is restarted the information you would have is the same at the moment the container was created, for instance your binary files and initial config files.
To store permanent you must have to mount a volume, this is a support for your container where a volume (container path) is linked with a external storage. with this if your container is restarted the volume will be mounted once the container is ready again and you are no gonna lose anything.
In kubernetes this is called Persistent Volumes and you can leverage this even if you are in another cloud provider,
steps to used
Define a path where you would mount the volume in your source code example /myfiles/private
Create a storage class in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Create a Persistent Volume Claim in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Relate this storage class with your Kubernetes deployment
Example
link the volume with your container
volumeMounts:
- mountPath: /myfiles/private
name: any-name-you-want
relate the persistent volume with your deployment
volumes:
- name: any-name-you-want
persistentVolumeClaim:
claimName: my-claim-name
This is really up to you. By default most base images will leave /tmp writeable as per normal. But anything written inside the container will be gone if/when the container restarts for any reason. For something like config data, that might be fine, for a database probably less so. To get more stable storage you need to use a Volume. The exact type to use depends on your environment and how long the data should live. An emptyDir volume lives only as long as the pod but can be shared between containers in the same pod. Beyond that you would probably use a PersistentVolumeClaim to dynamically provision a new Google Cloud disk which will last unless the claim is deleted (or forever depending on your Reclaim setting).