RStudio Job Launcher - NFS mounting issue

RStudio Job Launcher - NFS mounting issue - kubernetes

I am trying to integrate RStudio Worbench with Kubernetes as described in the official documentation https://docs.rstudio.com/rsw/integration/launcher-kubernetes/. In step 9 the Launcher starts a Kubernetes Job. The job is successfully assigned to a pod but the pod stucks in 'ContainerCreating' status display the next events:
Mounting command: mount
Mounting arguments: -t nfs MY.NFS.SERVER.IP:/home/MY_USER_DIR /var/lib/kubelet/pods/SOME_UUID/volumes/kubernetes.io~nfs/mount0
Output: mount.nfs: Connection timed out
Warning FailedMount 13m (x6 over 54m) kubelet Unable to attach or mount volumes: unmounted volumes=[mount0], unattached volumes=[kube-api-access-dllcd mount0]: timed out waiting for the condition
Warning FailedMount 2m29s (x26 over 74m) kubelet Unable to attach or mount volumes: unmounted volumes=[mount0], unattached volumes=[mount0 kube-api-access-dllcd]: timed out waiting for the condition
Configuration details:
Kubernetes is successfully installed on Amazon EKS and I am controlling the cluster from an admin EC2 instance outside the EKS cluster which I'm running NFS server and RStudio on
I can deploy an RStudio test job only without volume mounts
Both nfs-kernel-server service and RStudio are running
Our RStudio users are able to launch jobs in Local mode
The file /etc/exports contains:
/nfsexport 127.0.0.1(rw,sync,no_subtree_check)
/home/MY_USER_DIR MY.IP.SUBNET.RANGE/16(rw,sync,no_subtree_check)
Inbound traffic to the NFS server from the Kubernetes worker nodes is allowed via port 2049
What I have tried:
Mount some folder locally on the same machine as the NFS server - that works
Mount using different IPs for the NFS server: localhost, public IPv4, and private IPv4 of the EC2 instance (with and without specifying the port 2049) - that did not work
Connect to a client machine and try to manually mount from there. Trying to mount the share on the server resulted in:
mount.nfs: rpc.statd is not running but is required for
remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: mount.nfs: Operation not permitted
Trying to ping the admin instance from a worker node doesn't work even though all EKS plugins (coredns, kube-proxy, vpc-cni) are active.
Question: What could be the root problem causing the mounting issue? Thank you in advance!

I have faced similar issue with RStudio job launcher. The reason for this issue was because of non-existent directory in the NFS. The home directory in the pod gets mounted to the user's home directory in the NFS.
For example if you logged into the RStudio as user1, the system expects a home directory for user1 in the NFS. With the default configuration, this directory will be /home/user1. If this directory does not exists in NFS, the mount will fail and the command gets timed out.
The simple way to debug this issue is by manually trying to mount the directories
mkdir -p /tmp/home/user1
mount -v -t nfs ipaddress:/home/user1 /tmp/home/user1
The above command will fail if the /home/user1 directory does not exist in the NFS. The solution is to ensure that the user home directories exist in the NFS.

Related

cannot mount NFS share on my Mac PC to minikube cluster

Problem
To make k8s multinodes dev env, I was trying to use NFS persistent volume in minikube with multi-nodes and cannot run pods properly. It seems there's something wrong with NFS setting. So I run minikube ssh and tried to mount the nfs volume manually first by mount command but it doesnt work, which bring me here.
When I run
sudo mount -t nfs 192.168.xx.xx(=macpc's IP):/PATH/TO/EXPORTED/DIR/ON/MACPC /PATH/TO/MOUNT/POINT/IN/MINIKUBE/NODE
in minikube master node, the output is
mount.nfs: requested NFS version or transport protocol is not supported
Some relavant info is
NFS client: minikube nodes
NFS server: my Mac PC
minikube driver: docker
Cluster comprises 3 nodes. (1 master and 2 worker nodes)
Currently there's no k8s resources (such as deployment, pv and pvc) in cluster.
minikube nodes' os is Ubuntu so I guess "nfs-utils" is not relavant and not installed. "nfs-common" is preinstalled in minikube.
Please see the following sections for more detail.
Goal
The goal is mount cmd in minikube nodes succeeds and nfs share on my Mac pc mounts properly.
What I've done so far is
On NFS server side,
created /etc/exports file on mac pc. The content is like
/PATH/TO/EXPORTED/DIR/ON/MACPC -mapall=user:group 192.168.xx.xx(=the output of "minikube ip")
and run nfsd update and then showmount -e cmd outputs
Exports list on localhost:
/PATH/TO/EXPORTED/DIR/ON/MACPC 192.168.xx.xx(=the output of "minikube ip")
rpcinfo -p shows rpcbind(=portmapper in linux), status, nlockmgr, rquotad, nfs, mountd are all up in tcp and udp
ping 192.168.xx.xx(=the output of "minikube ip") says
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
and continues
It seems I can't reach minikube from host.
On NFS client side,
started nfs-common and rpcbind services with systemctl cmd in all minikube nodes. By running sudo systemctl status rpcbind and sudo systemctl status nfs-common, I confirmed rpcbind and nfs-common are running.
minikube ssh output
Last login: Mon Mar 28 09:18:38 2022 from 192.168.xx.xx(=I guess my macpc's IP seen from minikube cluster)
so I run
sudo mount -t nfs 192.168.xx.xx(=macpc's IP):/PATH/TO/EXPORTED/DIR/ON/MACPC /PATH/TO/MOUNT/POINT/IN/MINIKUBE/NODE
in minikube master node.
The output is
mount.nfs: requested NFS version or transport protocol is not supported
rpcinfo -p shows only portmapper and status are running. I am not sure this is ok.
ping 192.168.xx.xx(=macpc's IP) works properly.
ping host.minikube.internal works properly.
nc -vz 192.168.xx.xx(=macpc's IP) 2049 outputs connection refused
nc -vz host.minikube.internal 2049 outputs succeeded!
Thanks in advance!

I decided to use another type of volume instead.

What are the differences between a 9P and hostPath mount in Kubernetes?

I am looking to do local dev of an app that is running in Kubernetes on minikube. I want to mount a local directory to speed up development, so I can make code changes to my app (python) without rebuilding the container.
If I understand correctly, I have two out of the box options:
9P mount which is provided by minikube
hostPath mount which comes directly from Kubernetes
What are the differences between these, and in what cases would one be appropriate over the other?

9P mount and hostPath are two different concepts. You cannot mount directory to pod using 9P mount.
9P mount is used to mount host directory into the minikube VM.
HostPath is a persistent volume which mounts a file or directory from the host node's(in your case minikube VM) filesystem into your Pod.
Take a look also on types of Persistent Volumes: pv-types-k8s.
If you want to mount a local directory to pod:
First, you need to mount your directory for example: $HOME/your/path into your minikube VM using 9P. Execute command:
$ minikube start --mount-string="$HOME/your/path:/data"
Then if you mount /data into your Pod using hostPath, you will get you local directory data into Pod.
Another solution:
Mount host's $HOME directory into minikube's /hosthome directory. Get your data:
$ ls -la /hosthome/your/path
To mount this directory, you have to just change your Pod's hostPath
hostPath:
path: /hosthome/your/path
Take a look: minikube-mount-data-into-pod.
Also you need to know that:
Minikube is configured to persist files stored under the following
directories, which are made in the Minikube VM (or on your localhost
if running on bare metal). You may lose data from other directories on
reboots.
More: note-persistence-minikube.
See driver-mounts as an alternative.

Mount error while trying to connect encrypted AWS EFS with efs-csi-node in EKS

I have tried connecting unencrypted EFS and it is working fine, but with encrypted EFS, the pod is throwing below error:
Normal Scheduled 10m default-scheduler Successfully assigned default/jenkins-efs-test-8ffb4dc86-xnjdj to ip-10-100-4-249.ap-south-1.compute.internal
Warning FailedMount 6m33s (x2 over 8m49s) kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[sc-config-volume tmp jenkins-home jenkins-config secrets-dir plugins plugin-dir jenkins-efs-test-token-7nmkz]: timed out waiting for the condition
Warning FailedMount 4m19s kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[plugins plugin-dir jenkins-efs-test-token-7nmkz sc-config-volume tmp jenkins-home jenkins-config secrets-dir]: timed out waiting for the condition
Warning FailedMount 2m2s kubelet, ip-10-100-4-249.ap-south-1.compute.internal Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[tmp jenkins-home jenkins-config secrets-dir plugins plugin-dir jenkins-efs-test-token-7nmkz sc-config-volume]: timed out waiting for the condition
Warning FailedMount 35s (x13 over 10m) kubelet, ip-10-100-4-249.ap-south-1.compute.internal MountVolume.SetUp failed for volume "efs-pv" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Internal desc = Could not mount "" at "/var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs /var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount
Output: mount: /var/lib/kubelet/pods/354800a1-dcf5-4812-aa91-0e84ca6fba59/volumes/kubernetes.io~csi/efs-pv/mount: can't find in /etc/fstab.
What am I missing here?

You didn't specify what the K8s manifests are or any configuration. There shouldn't be any difference between encrypted and non-encrypted volumes when it comes to mounting from the client-side. In essence, AWS manages the encryption keys for you using KMS.
The error you are seeing is basically because the mount command is not specifying the mount point so there must have been some other default configuration from the K8s side that you are changing when using un-encrypted EFS volumes. Also, is the EFS Mount helper available on the Kubernetes node where you are trying to mount the EFS Volume?
✌️

Check the logs of the cloud init agent (/var/logs/cloud-init.log and /var/logs/cloud-init-output.log) if the EFS filesystem mount does not work as expected. Check /etc/fstab file.
Try to update efs-csi-node daemonset from amazon/aws-efs-csi-driver:v0.3.0 image to amazon/aws-efs-csi-driver:latest.
Here is example mounting EFS script. Compare it to yours and note that:
Dependencies for this script:
Default ECS cluster configuration (Amazon Linux ECS AMI).
The ECS instance must have a IAM role that gives it at least read access to EFS (in order to locate the EFS filesystem ID).
The ECS instance must be in a security group that allows port tcp/2049 (NFS) inbound/outbound.
The security group that the ECS instance belongs to must be associated with the target EFS filesystem.
Notes on this script:
The EFS mount path is calculated on a per-instance basis as the EFS endpoint varies depending upon the region and availability zone where the instance is launched.
The EFS mount is added to /etc/fstab so that if the ECS instance is rebooted, the mount point will be re-created.
Docker is restarted to ensure it correctly detects the EFS filesystem mount.
Restart docker after mounting EFS with command: $ service docker restart. At the end try to reboot the EKS worker node.
Take a look: mounting-efs-in-eks-cluster-example-deployment-fails, efs-provisioner, dynamic-ip-in-etc-fstab.

How to define the uid, gid of a mounted volume in Pod

This is a question in our production environment. We use Kubernetes to deploy our application through Pods. The Pods may need some storage to store files.
We use 'Persistent Volume' and 'Persistent Volume Claim' to present the real backend storage server. Currently, the real back storage server is 'NFS'. But the 'NFS' is not controlled by us and we cannot change the NFS configuration.
Every time, the uid and gid of the volume mount into the Pod is always 'root root'. But the process in the Pod is running as a non-root user, the process cannot read/write the mounted volume.
What our current solution is that we define an initContainer which run as root and use command 'chown [udi] [gid] [folder]' to change the ownership. There is a limitation that the ininContainer must be run as root.
For now, we are trying to deploy our application on Openshift. By default, all the Pods(containers) cannot be run as root. Otherwise, the Pod is failed to create.
So my question is that a k8s way or Openshift way to define/change the uid and gid of the mounted volume.
I have tried mountOptions which in talked about in Kubernetes Persistent Volume Claim mounted with wrong gid
mountOptions: #these options
- uid=1000
- gid=1000
But failed with the below error message. Seems that the NFS server does not support the uid and gid parameters.
Warning FailedMount 11s kubelet, [xxxxx.net] MountVolume.SetUp failed for volume "nfs-gid-pv" : mount failed: exit status 32 Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv --scope -- mount -t nfs -o gid=1999,uid=1999 shc-sma-cd74.hpeswlab.net:/var/vols/itom/itsma/tzhong /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv
Output: Running scope as unit run-22636.scope.
mount.nfs: an incorrect mount option was specified
Warning FailedMount 7s kubelet, [xxxxx.net] MountVolume.SetUp failed for volume "nfs-gid-pv" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv --scope -- mount -t nfs -o gid=1999,uid=1999 shc-sma-cd74.hpeswlab.net:/var/vols/itom/itsma/tzhong /opt/kubernetes/data/kubelet/pods/3c75930a-d3f7-4d55-9996-4d10dcac9549/volumes/kubernetes.io~nfs/nfs-gid-pv
Output: Running scope as unit run-22868.scope.
mount.nfs: an incorrect mount option was specified

If we speak about Kubernetes, you could set group ID that owns the volume this can be done by using fsGroup, a feature from Pod Security Context.
As or OpenShift I do not know.
apiVersion: v1
kind: Pod
metadata:
name: hello-world
spec:
containers:
# specification of the pod's containers
# ...
securityContext:
fsGroup: 1000
The security context for a Pod applies to the Pod's Containers and also to the Pod's Volumes when applicable. Specifically fsGroup and seLinuxOptions are applied to Volumes as follows:
fsGroup: Volumes that support ownership management are modified to be owned and writable by the GID specified in fsGroup. See the Ownership Management design document for more details.
You can also read more about it here and follow steps posted by #rajdeepbs29 posted here.

hyperledger fabric set up on k8s fails to deploy containers

I am using aws and kops to deploy kubernetes cluster and kubectl to manage it https://medium.com/#zhanghenry/how-to-deploy-hyperledger-fabric-on-kubernetes-2-751abf44c807 following this tutorial.
but when i try to deploy pods i get following error.
MountVolume.SetUp failed for volume "org1-pv" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/73568350-cfc0-11e8-ad99-0a84e5efbfb6/volumes/kubernetes.io~nfs/org1-pv --scope -- mount -t nfs nfs-server-IP:/opt/share/crypto-config/peerOrganizations/org1 /var/lib/kubelet/pods/73568350-cfc0-11e8-ad99-0a84e5efbfb6/volumes/kubernetes.io~nfs/org1-pv Output: Running as unit run-24458.scope. mount.nfs: Connection timed out
i have configured external nfs server such as
/opt/share *(rw,sync,no_root_squash,no_subtree_check)
any kind of help is appreciated.

I think you should check following things to verify that nfs is mounted successfully or not
run this command on node where you want to mount.
$showmount -e nfs-server-ip
like in my case $showmount -e 172.16.10.161
Export list for 172.16.10.161:
/opt/share *
use $df -hT command see that Is nfs is mounted or not like in my case it will give output 172.16.10.161:/opt/share nfs4 91G 32G 55G 37% /opt/share
if not mounted then use following command
$sudo mount -t nfs 172.16.10.161:/opt/share /opt/share
if above commands show error then check firewall is allowing nfs or not
$sudo ufw status
if not then allow using command
$sudo ufw allow from nfs-server-ip to any port nfs
I made the same setup I don't faced any issues.My cluster of fabric is running successfully