This is a question in our production environment. We use Kubernetes to deploy our application through Pods. The Pods may need some storage to store files.
We use 'Persistent Volume' and 'Persistent Volume Claim' to present the real backend storage server. Currently, the real back storage server is 'NFS'. But the 'NFS' is not controlled by us and we cannot change the NFS configuration.
Every time, the uid and gid of the volume mount into the Pod is always 'root root'. But the process in the Pod is running as a non-root user, the process cannot read/write the mounted volume.
What our current solution is that we define an initContainer which run as root and use command 'chown [udi] [gid] [folder]' to change the ownership. There is a limitation that the ininContainer must be run as root.
For now, we are trying to deploy our application on Openshift. By default, all the Pods(containers) cannot be run as root. Otherwise, the Pod is failed to create.
So my question is that a k8s way or Openshift way to define/change the uid and gid of the mounted volume.
I have tried mountOptions which in talked about in Kubernetes Persistent Volume Claim mounted with wrong gid
mountOptions: #these options
- uid=1000
- gid=1000
But failed with the below error message. Seems that the NFS server does not support the uid and gid parameters.
Warning FailedMount 11s kubelet, [xxxxx.net] MountVolume.SetUp failed for volume "nfs-gid-pv" : mount failed: exit status 32 Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv --scope -- mount -t nfs -o gid=1999,uid=1999 shc-sma-cd74.hpeswlab.net:/var/vols/itom/itsma/tzhong /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv
Output: Running scope as unit run-22636.scope.
mount.nfs: an incorrect mount option was specified
Warning FailedMount 7s kubelet, [xxxxx.net] MountVolume.SetUp failed for volume "nfs-gid-pv" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv --scope -- mount -t nfs -o gid=1999,uid=1999 shc-sma-cd74.hpeswlab.net:/var/vols/itom/itsma/tzhong /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv
Output: Running scope as unit run-22868.scope.
mount.nfs: an incorrect mount option was specified
If we speak about Kubernetes, you could set group ID that owns the volume this can be done by using fsGroup, a feature from Pod Security Context.
As or OpenShift I do not know.
apiVersion: v1
kind: Pod
metadata:
name: hello-world
spec:
containers:
# specification of the pod's containers
# ...
securityContext:
fsGroup: 1000
The security context for a Pod applies to the Pod's Containers and also to the Pod's Volumes when applicable. Specifically fsGroup and seLinuxOptions are applied to Volumes as follows:
fsGroup: Volumes that support ownership management are modified to be owned and writable by the GID specified in fsGroup. See the Ownership Management design document for more details.
You can also read more about it here and follow steps posted by #rajdeepbs29 posted here.
Related
Setup:
Linux VM where Pod (containing 3 containers) is started.
Only 1 of the containers needs the NFS mount to the remote NFS server.
This "app" container is based Alpine linux.
Remote NFS server is up & running. If I create a separate yaml file for persistent volume with that server info - it's up & available.
In my pod yaml file I define Persistent Volume (with that remote NFS server info), Persistent Volume Claim and associate my "app" container's volume with that claim.
Everything works as a charm if on the hosting linux VM I install the NFS library, like:
sudo apt install nfs-common.
(That's why I don't share my kubernetes yaml file. Looks like problem is not there.)
But that's a development environment. I'm not sure how/where those containers would be used in production. For example they would be used in AWS EKS.
I hoped to install something like
apk add --no-cache nfs-utils in the "app" container's Dockerfile.
I.e. on container level, not on a pod level - could it work?
So far getting the pod initialization error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 22s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 20s default-scheduler Successfully assigned default/delphix-masking-0 to masking-kubernetes
Warning FailedMount 4s (x6 over 20s) kubelet MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=4.1 maxTestNfs1.dlpxdc.co:/var/tmp/masking-mount /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv
Output: mount: /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
And the process is stuck in that step forever.
Looks like it happens before even trying to initialize containers.
So I wonder if approach of enabling NFS-client on the container's level is valid.
Thanks in ahead for any insights!
I hoped to install something like apk add --no-cache nfs-utils in the "app" container's Dockerfile. I.e. on container level, not on a pod level - could it work?
Yes, this could work. This is normally what you would do if you have no control to the node (eg. you can't be sure if the host is ready for NFS calls). You need to ensure your pod can reach out to the NFS server and in between all required ports are opened. You also needs to ensure required NFS program (eg. rpcbind) is started before your own program in the container.
...For example they would be used in AWS EKS.
EKS optimized AMI come with NFS supports, you can leverage K8S PV/PVC support using this image for your worker node, there's no need to initialize NFS client support in your container.
TL;DR - no, it is not best practice (and not the right way) to mount NFS volumes from inside the container. There is no use-case for it. And it is a huge security risk as well (allowing applications direct access to cluster-wide storage without any security control).
It appears your objective is to provide storage to your container that is backed by NFS, right? Then doing it by creating a PersistentVolume and then using a PersistentVolumeClaim to attach it to your Pod is the correct approach. That you're already doing.
No, you don't have to worry about how will the storage be provided to the container that is because due to the way k8s runs applications, certain conditions MUST be met before a Pod can be scheduled on a Node. One of those conditions is, the volumes the pod is mounting MUST be available. If a volume doesnt exist, and you mount it in a Pod, that Pod will never get scheduled and will possibly be stuck in Pending state. That's what you see as the error as well.
You don't have to worry about the NFS connectivity, because in this case, the Kubernetes PersistentVolume resource you created will technically act like an NFS client for your NFS server. This provides a uniform storage interface (applications don't have to care where the volume is coming from, the application code will be independent of storage type) as well as better security and permission control.
Another note, when dealing with Kubernetes, it is recommended to consider Pod as your 'smallest unit' of infrastructure and not container. So, it is the recommended way to only use 1 application container per pod for simplicity of design and achieving the micro-service architecture in true sense.
I just created the following PersistantVolume.
apiVersion: v1
kind: PersistentVolume
metadata:
name: sql-pv
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/var/lib/sqldata"
Then I SSH the Node and traversed to the /var/lib. But I cannot see the sqldata directory created anywhere in it.
Where is the real directory created?
I created a POD that mounts this volume to a path inside the container. When I SSH the container, I can see the file in the mount path. Where are these files stored?
You have setup your cluster on Google Kubernetes Engine, that means nodes are virtual machine instances on GCP. You've probably been connecting to the cluster using the Kubernetes Engine dashboard and Connect to the cluster option. It does not SSH you to any of the node, it just starting GCP Cloud Shell terminal instance with following command like:
gcloud container clusters get-credentials {your-cluster} --zone {your-zone} --project {your-project-name}
That command is configuring kubectl agent on GCP Cloud Shell by setting proper cluster name, certificates etc. in ~/.kube/config file so you have access to the cluster (by communicating with the cluster endpoint), but you are not SSHed to any node. That's why you can't access the path defined in the hostPath.
To find a hostPath directory, you need to:
find on which node is the pod
SSH into this node
Finding a node:
Run following kubectl get pod {pod-name} with -o wide flag command - change {pod-name} to your pod name
user#cloudshell:~ (project)$ kubectl get pod task-pv-pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
task-pv-pod 1/1 Running 0 53m xx.xx.x.xxx gke-test-v-1-21-default-pool-82dbc10b-8mvx <none> <none>
SSH to the node:
Run following gcloud compute ssh {cluster-name} command - change {cluster-name} to node name from the previous command:
user#cloudshell:~ (project)$ gcloud compute ssh gke-test-v-1-21-default-pool-82dbc10b-8mvx
Welcome to Kubernetes v1.21.3-gke.2001!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/home/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release-gke/release/v1.21.3-gke.2001/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.21.3-gke.2001
For Kubernetes copyright and licensing information, see:
/home/kubernetes/LICENSES
user#gke-test-v-1-21-default-pool-82dbc10b-8mvx ~ $
Now there will be a hostPath directory (in your case /var/lib/sqldata), there will also be files if pod created some.
Avoid hostPath if possible
It's not recommended using hostPath. As mentioned in the comments, it will cause issues when a pod will be created on the different node (but you have a single node cluster) but it also presents many security risks:
Warning:
HostPath volumes present many security risks, and it is a best practice to avoid the use of HostPaths when possible. When a HostPath volume must be used, it should be scoped to only the required file or directory, and mounted as ReadOnly.
If restricting HostPath access to specific directories through AdmissionPolicy, volumeMounts MUST be required to use readOnly mounts for the policy to be effective.
In your case it's much better to use the gcePersistentDiskvolume type - check this article.
I have tried connecting unencrypted EFS and it is working fine, but with encrypted EFS, the pod is throwing below error:
Normal Scheduled 10m default-scheduler Successfully assigned default/jenkins-efs-test-8ffb4dc86-xnjdj to ip-10-100-4-249.ap-south-1.compute.internal
Warning FailedMount 6m33s (x2 over 8m49s) kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[sc-config-volume tmp jenkins-home jenkins-config secrets-dir plugins plugin-dir jenkins-efs-test-token-7nmkz]: timed out waiting for the condition
Warning FailedMount 4m19s kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[plugins plugin-dir jenkins-efs-test-token-7nmkz sc-config-volume tmp jenkins-home jenkins-config secrets-dir]: timed out waiting for the condition
Warning FailedMount 2m2s kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[tmp jenkins-home jenkins-config secrets-dir plugins plugin-dir jenkins-efs-test-token-7nmkz sc-config-volume]: timed out waiting for the condition
Warning FailedMount 35s (x13 over 10m) kubelet, ip-10-100-4-249.ap-south-1.compute.internal MountVolume.SetUp failed for volume "efs-pv" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Internal desc = Could not mount "" at "/var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs /var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount
Output: mount: /var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount: can't find in /etc/fstab.
What am I missing here?
You didn't specify what the K8s manifests are or any configuration. There shouldn't be any difference between encrypted and non-encrypted volumes when it comes to mounting from the client-side. In essence, AWS manages the encryption keys for you using KMS.
The error you are seeing is basically because the mount command is not specifying the mount point so there must have been some other default configuration from the K8s side that you are changing when using un-encrypted EFS volumes. Also, is the EFS Mount helper available on the Kubernetes node where you are trying to mount the EFS Volume?
✌️
Check the logs of the cloud init agent (/var/logs/cloud-init.log and /var/logs/cloud-init-output.log) if the EFS filesystem mount does not work as expected. Check /etc/fstab file.
Try to update efs-csi-node daemonset from amazon/aws-efs-csi-driver:v0.3.0 image to amazon/aws-efs-csi-driver:latest.
Here is example mounting EFS script. Compare it to yours and note that:
Dependencies for this script:
Default ECS cluster configuration (Amazon Linux ECS AMI).
The ECS instance must have a IAM role that gives it at least read access to EFS (in order to locate the EFS filesystem ID).
The ECS instance must be in a security group that allows port tcp/2049 (NFS) inbound/outbound.
The security group that the ECS instance belongs to must be associated with the target EFS filesystem.
Notes on this script:
The EFS mount path is calculated on a per-instance basis as the EFS endpoint varies depending upon the region and availability zone where the instance is launched.
The EFS mount is added to /etc/fstab so that if the ECS instance is rebooted, the mount point will be re-created.
Docker is restarted to ensure it correctly detects the EFS filesystem mount.
Restart docker after mounting EFS with command: $ service docker restart. At the end try to reboot the EKS worker node.
Take a look: mounting-efs-in-eks-cluster-example-deployment-fails, efs-provisioner, dynamic-ip-in-etc-fstab.
My overall issue is that my pod which has a PVC is stuck on container-creating after it was deleted. My guess why, is because of the following:
So, I have a pod with a mounted PVC. I did a:
kubectl exec -it "name" bash
navigated to the path of the mounted PVC and wanted to create a tar gzip file of several directories. The reason was because I wanted to copy the folders to local, but they were quite big. Anyways, managed to create the tar file, but someone else released to our dev environment and the pod was killed. After that, when recreating our env, the pod with the PVC that has the tar file is stuck on container creating. Is it because that I created that file on the PVC? Like, based on the warnings it seems like the PVC points to the previous pod?
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
graphite-pvc Bound xxxx 256Gi RWO managed-premium 12
and if I do, i get the following warnings:
kubectl describe pod xxx
Warning FailedAttachVolume 22m (x8 over 24m) attachdetach-controller
AttachVolume.Attach failed for volume "pvc-f65cb358-014b-11ea-b698-000d3a556597" : Attach volume "kubernetes-dynamic-pvc-f65cb358-014b-11ea-b698-000d3a556597" to instance "/subscriptions/1405bf18-bf7d-4a2f-9aa7-25ff73ba58a6/resourceGroups/cie-dev-2-1-eastus/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-dev-nodes-2002/virtualMachines/6" failed with compute.VirtualMachineScaleSetVMsClient#Update: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status= Code="ConflictingUserInput" Message="Disk '/subscriptions/1405bf18-bf7d-4a2f-9aa7-25ff73ba58a6/resourceGroups/cie-dev-2-1-eastus/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-f65cb358-014b-11ea-b698-000d3a556597' cannot be attached as the disk is already owned by VM '/subscriptions/1405bf18-bf7d-4a2f-9aa7-25ff73ba58a6/resourceGroups/cie-dev-2-1-eastus/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-dev-nodes-2002/virtualMachines/k8s-dev-nodes-2002_111'."
and
Warning FailedMount 48s (x13 over 28m) kubelet, k8s-dev-nodes-2002000006 Unable to mount volumes for pod "xxxx": timeout expired waiting for volumes to attach or mount for pod "xxxxxx". list of unmounted volumes=[pvc_name]. list of unattached volumes=[pvc_name default-token-6tmkm]
So, first, do you think it has any correlation with the fact that I was inside the PVC and created a file, when the pod was killed, or is it pure coincidence (cannot be, right?).
Suppose I bootstrap a single master node with kubelet v1.10.3 in OpenStack cloud and I would like to have a "self-hosted" single etcd node for k8s necessities as a pod.
Before starting kube-apiserver component you need a working etcd instance, but of course you can't just perform kubectl apply -f or put a manifest to addon-manager folder because cluster is not ready at all.
There is a way to start pods by kubelet without having a ready apiserver. It is called static pods (yaml Pod definitions usually located at /etc/kubernetes/manifests/). And it is the way I start "system" pods like apiserver, scheduler, controller-manager and etcd itself. Previously I just mounted a directory from node to persist etcd data, but now I would like to use OpenStack blockstorage resource. And here is the question: how can I attach, mount and use OpenStack cinder volume to persist etcd data from static pod?
As I learned today there are at least 3 ways to attach OpenStack volumes:
CSI OpenStack cinder driver which is pretty much new way of managing volumes. And it won't fit my requirements, because in static pods manifests I can only declare Pods and not other resources like PVC/PV while CSI docs say:
The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
before-csi way to attach volumes is: FlexVolume.
FlexVolume driver binaries must be installed in a pre-defined volume plugin path on each node (and in some cases master).
Ok, I added those binaries to my node (using this DS as a reference), added volume to pod manifest like this:
volumes:
- name: test
flexVolume:
driver: "cinder.io/cinder-flex-volume-driver"
fsType: "ext4"
options:
volumeID: "$VOLUME_ID"
cinderConfig: "/etc/kubernetes/cloud-config"
and got the following error from kubelet logs:
driver-call.go:258] mount command failed, status: Failure, reason: Volume 2c21311b-7329-4cf4-8230-f3ce2f23cf1a is not available
which is weird because I am sure this Cinder volume is already attached to my CoreOS compute instance.
and the last way to mount volumes I know is cinder in-tree support which should work since at least k8s 1.5 and does not have any special requirements besides --cloud-provider=openstack and --cloud-config kubelet options.
The yaml manifest part for declaring volume for static pod looks like this:
volumes:
- name: html-volume
cinder:
# Enter the volume ID below
volumeID: "$VOLUME_ID"
fsType: ext4
Unfortunately when I try this method I get the following error from kubelet:
Volume has not been added to the list of VolumesInUse in the node's volume status for volume.
Do not know what it means but sounds like the node status could not be updated (of course, there is no etcd and apiserver yet). Sad, it was the most promising option for me.
Are there any other ways to attach OpenStack cinder volume to a static pod relying on kubelet only (when cluster is actually not ready)? Any ideas on what cloud I miss of got above errors?
Message Volume has not been added to the list of VolumesInUse in the node's volume status for volume. says that attach/detach operations for that node are delegated to controller-manager only. Kubelet waits for attachment being made by controller but volume doesn't reach appropriate state because controller isn't up yet.
The solution is to set kubelet flag --enable-controller-attach-detach=false to let kubelet attach, mount and so on. This flag is set to true by default because of the following reasons
If a node is lost, volumes that were attached to it can be detached
by the controller and reattached elsewhere.
Credentials for attaching and detaching do not need to be made
present on every node, improving security.
In your case setting of this flag to false is reasonable as this is the only way to achieve what you want.