How to create an kubernetes NFS volume on Google Container Engine - kubernetes

I am trying to create a kubernetes NFS volume on Google Container Engine (GKE) and get it used by a deployment.
I did this in several steps as it shown in this github repository kubernetes-nfs-volume-on-gke:
Create a GKE cluster and GCE persistent disk
Config the context for the kubectl to deal with the GKE cluster
Creation of the PersistentVolume (PV) and the PersistentVolumeClaim (PVC)
Creation of an NFS server
Create a service for the NFS server to expose it (the IP address of that service is used for the creation of the NFS PV and NFS PVC)
Creation of NFS volume
Create a Deployment of a busybox for checking the NFS volume is accessible.
After fellowing these step, this is the obtained error:
$ kubectl describe pods nfs-busybox-2762569073-lhb5p
Name: nfs-busybox-2762569073-lhb5p
Namespace: default
Node: gke-mappedinn-cluster-default-pool-f94cb0d4-fmfb/10.240.0.3
Start Time: Wed, 12 Apr 2017 04:12:20 +0400
Labels: name=nfs-busybox
pod-template-hash=2762569073
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nfs-busybox-2762569073","uid":"b1e523ae-1f14-11e7-a084-42010a8e0...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container busybox
Status: Pending
IP:
Controllers: ReplicaSet/nfs-busybox-2762569073
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port:
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/mnt from my-pvc-nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-20n4b (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
my-pvc-nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-20n4b:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-20n4b
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 default-scheduler Normal Scheduled Successfully assigned nfs-busybox-2762569073-lhb5p to gke-mappedinn-cluster-default-pool-f94cb0d4-fmfb
3m 48s 2 kubelet, gke-mappedinn-cluster-default-pool-f94cb0d4-fmfb Warning FailedMount Unable to mount volumes for pod "nfs-busybox-2762569073-lhb5p_default(b1e7c901-1f14-11e7-a084-42010a8e0116)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-2762569073-lhb5p". list of unattached/unmounted volumes=[my-pvc-nfs]
3m 48s 2 kubelet, gke-mappedinn-cluster-default-pool-f94cb0d4-fmfb Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-2762569073-lhb5p". list of unattached/unmounted volumes=[my-pvc-nfs]
37s 37s 1 kubelet, gke-mappedinn-cluster-default-pool-f94cb0d4-fmfb Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/b1e7c901-1f14-11e7-a084-42010a8e0116-nfs" (spec.Name: "nfs") pod "b1e7c901-1f14-11e7-a084-42010a8e0116" (UID: "b1e7c901-1f14-11e7-a084-42010a8e0116") with: mount failed: exit status 32
Mounting command: /home/kubernetes/bin/mounter
Mounting arguments: 10.247.250.208:/exports /var/lib/kubelet/pods/b1e7c901-1f14-11e7-a084-42010a8e0116/volumes/kubernetes.io~nfs/nfs nfs []
Output: Running mount using a rkt fly container
run: group "rkt" not found, will use default gid when rendering images
In the kubernetes dashboard, the error is as follows:
Unable to mount volumes for pod "nfs-busybox-2762569073-lhb5p_default(b1e7c901-1f14-11e7-a084-42010a8e0116)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-2762569073-lhb5p". list of unattached/unmounted volumes=[my-pvc-nfs]
Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-2762569073-lhb5p". list of unattached/unmounted volumes=[my-pvc-nfs]
Have I missed something?
Thanks,

This comment in the issue on kubernetes seems to solve this NFS issue on GKE.
Qutoing that comment:
Edit examples/volumes/nfs/nfs-pv.yaml change the last line to path: "/".
Edit examples/volumes/nfs/nfs-server-rc.yaml change the image to the one that enabled NFSv4 image: gcr.io/google_containers/volume-nfs:0.8
Also there are other issues where this is tracked here and here.

Related

RookIO AttachVolume.Attach failed for volume

I have Kubernetes 1.18 with rookio setup, this pod was running for sometime. one of the node went out of Ready status for some reason. I rebooted the node, now its in Ready status.
But Pod stuck on ContainerCreating status. its waiting to mount the rookio PVC.
Pod status
# kgp |grep -v Running
NAME READY STATUS RESTARTS AGE
redis-slave-0 0/1 ContainerCreating 0 14h
PodEvents
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 23m (x69 over 13h) kubelet, node05 Unable to attach or mount volumes: unmounted volumes=[redis-data], unattached volumes=[redis-data config redis-tmp-conf default-token-lqpgm health]: timed out waiting for the condition
Warning FailedMount 19m (x95 over 13h) kubelet, node05 Unable to attach or mount volumes: unmounted volumes=[redis-data], unattached volumes=[health redis-data config redis-tmp-conf default-token-lqpgm]: timed out waiting for the condition
Warning FailedMount 14m (x79 over 13h) kubelet, node05 Unable to attach or mount volumes: unmounted volumes=[redis-data], unattached volumes=[default-token-lqpgm health redis-data config redis-tmp-conf]: timed out waiting for the condition
Warning FailedMount 5m45s (x66 over 13h) kubelet, node05 Unable to attach or mount volumes: unmounted volumes=[redis-data], unattached volumes=[config redis-tmp-conf default-token-lqpgm health redis-data]: timed out waiting for the condition
Warning FailedAttachVolume 2m44s (x101 over 6h32m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-e854eee7-0a36-4a92-ba61-f9e6e976f64c" : attachdetachment timeout for volume 0001-0009-rook-ceph-0000000000000002-0c4a5173-e8a7-11ea-9bd1-0637030c9151
PVC attach status set to false
kubectl get volumeattachment |grep -v true
NAME ATTACHER PV NODE ATTACHED AGE
csi-3424d1bdc5212aeef30e681c9d99df38dd68fdabb47e5f820125c90d54d61d7b rook-ceph.rbd.csi.ceph.com pvc-e854eee7-0a36-4a92-ba61-f9e6e976f64c node05 false 14h
I try to move the pod to different node, still same issue.
PV and PVC status
# k describe pv pvc-e854eee7-0a36-4a92-ba61-f9e6e976f64c
Name: pvc-e854eee7-0a36-4a92-ba61-f9e6e976f64c
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pv-protection]
StorageClass: rook-ceph-block
Status: Bound
Claim: default/redis-data-redis-slave-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 287Mi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: rook-ceph.rbd.csi.ceph.com
FSType: ext4
VolumeHandle: 0001-0009-rook-ceph-0000000000000002-0c4a5173-e8a7-11ea-9bd1-0637030c9151
ReadOnly: false
VolumeAttributes: clusterID=rook-ceph
imageFeatures=layering
imageFormat=2
imageName=csi-vol-0c4a5173-e8a7-11ea-9bd1-0637030c9151
journalPool=replicapool
pool=replicapool
radosNamespace=
storage.kubernetes.io/csiProvisionerIdentity=1598460149789-8081-rook-ceph.rbd.csi.ceph.com
k describe pvc redis-data-redis-slave-0
Name: redis-data-redis-slave-0
Namespace: default
StorageClass: rook-ceph-block
Status: Bound
Volume: pvc-e854eee7-0a36-4a92-ba61-f9e6e976f64c
Labels: app=redis
component=slave
heritage=Helm
release=redis
role=slave
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 287Mi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: redis-slave-0
Events: <none>
How to fix this mount issue?
Thanks
SR
Please check if you have the PV and Node in the same zone(availability zones in AWS). If they are in different zones the PV will not attach to the node.
To resolve this simply delete the PV and PVC and recreate it, it will create the new PV in the same zone as node.

Pulling a image from gcr.to fails

I am able to create a kubernetes cluster and I followed the steps in to pull a private image from GCR repository.
https://cloud.google.com/container-registry/docs/advanced-authentication
https://cloud.google.com/container-registry/docs/access-control
I am unable to pull the image from GCR. I have used the below commands
gcloud auth login
I have authendiacted the service accounts.
Connection between the local machine and gcr as well.
Below is the error
$ kubectl describe pod test-service-55cc8f947d-5frkl
Name: test-service-55cc8f947d-5frkl
Namespace: default
Priority: 0
Node: gke-test-gke-clus-test-node-poo-c97a8611-91g2/10.128.0.7
Start Time: Mon, 12 Oct 2020 10:01:55 +0530
Labels: app=test-service
pod-template-hash=55cc8f947d
tier=test-service
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container test-service
Status: Pending
IP: 10.48.0.33
IPs:
IP: 10.48.0.33
Controlled By: ReplicaSet/test-service-55cc8f947d
Containers:
test-service:
Container ID:
Image: gcr.io/test-256004/test-service:v2
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment:
test_SERVICE_BUCKET: test-pt-prod
COPY_FILES_DOCKER_IMAGE: gcr.io/test-256004/test-gcs-copy:latest
test_GCP_PROJECT: test-256004
PIXALATE_GCS_DATASET: test_pixalate
PIXALATE_BQ_TABLE: pixalate
APP_ADS_TXT_GCS_DATASET: test_appadstxt
APP_ADS_TXT_BQ_TABLE: appadstxt
Mounts:
/test/output from test-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-6g7nl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
test-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: test-pvc
ReadOnly: false
default-token-6g7nl:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6g7nl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 42s default-scheduler Successfully assigned default/test-service-55cc8f947d-5frkl to gke-test-gke-clus-test-node-poo-c97a8611-91g2
Normal SuccessfulAttachVolume 38s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-25025b4c-2e89-4400-8e0e-335298632e74"
Normal SandboxChanged 31s kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Pod sandbox changed, it will be killed and re-created.
Normal Pulling 15s (x2 over 32s) kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Pulling image "gcr.io/test-256004/test-service:v2"
Warning Failed 15s (x2 over 32s) kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Failed to pull image "gcr.io/test-256004/test-service:v2": rpc error: code = Unknown desc = Error response from daemon: pull access denied for gcr.io/test-256004/test-service, repository does not exist or may require 'docker login': denied: Permission denied for "v2" from request "/v2/test-256004/test-service/manifests/v2".
Warning Failed 15s (x2 over 32s) kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Error: ErrImagePull
Normal BackOff 3s (x4 over 29s) kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Back-off pulling image "gcr.io/test-256004/test-service:v2"
Warning Failed 3s (x4 over 29s) kubelet, gke-test-gke-clus-test-node-poo-c97a8611-91g2 Error: ImagePullBackOff
If you don't use workload identity, the default service account of your pod is this one of the nodes, and the nodes, by default, use the Compute Engine service account.
Make sure to grant it the correct permission to access to GCR.
If you use another service account, grant it with the Storage Object Reader role (when you pull an image, you read a blob stored in Cloud Storage (at least it's the same permission)).
Note: even if it's the default service account, I don't recommend to use the Compute Engine service account with any change in its roles. Indeed, it is project editor, that is a lot of responsability.

CockroachDB Cluster on Kubernetes Pods Crashing

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:
helm install my-release --set statefulset.replicas=2 stable/cockroachdb
I have already created 2 persistent volumes:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv00001 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-0 11m
pv00002 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-1 11m
I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?
I've tried using kubectl logs but the only error I'm seeing is this:
standard_init_linux.go:211: exec user process caused "exec format
error"
and the pods are crashing over and over:
NAME READY STATUS RESTARTS AGE
my-release-cockroachdb-0 0/1 Pending 0 11m
my-release-cockroachdb-1 0/1 CrashLoopBackOff 7 11m
my-release-cockroachdb-init-tfcks 0/1 CrashLoopBackOff 5 5m29s
Any idea why the pods are crashing?
Here's kubectl describe for the init pod:
Name: my-release-cockroachdb-init-tfcks
Namespace: default
Priority: 0
Node: axon/192.168.1.7
Start Time: Sat, 04 Apr 2020 00:22:19 +0100
Labels: app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations: <none>
Status: Running
IP: 10.44.0.1
IPs:
IP: 10.44.0.1
Controlled By: Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID: docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image: cockroachdb/cockroach:v19.2.4
Image ID: docker-pullable://cockroachdb/cockroach#sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 04 Apr 2020 00:28:04 +0100
Finished: Sat, 04 Apr 2020 00:28:04 +0100
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-cz2sn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cz2sn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal Pulled 5m9s (x5 over 6m45s) kubelet, axon Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal Created 5m8s (x5 over 6m45s) kubelet, axon Created container cluster-init
Normal Started 5m8s (x5 over 6m44s) kubelet, axon Started container cluster-init
Warning BackOff 92s (x26 over 6m42s) kubelet, axon Back-off restarting failed container
When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.
Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.
Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.
A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.
Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.
If you attach the log, I can take a look.
I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.

Minikube running out of space and failing despite --disk-size flag

I am trying to run a docker container registry in Minikube for testing a CSI driver that I am writing.
I am running minikube on mac and am trying to use the following minikube start command: minikube start --vm-driver=hyperkit --disk-size=40g. I have tried with both kubeadm and localkube bootstrappers and with the virtualbox vm-driver.
This is the resource definition I am using for the registry pod deployment.
---
apiVersion: v1
kind: Pod
metadata:
name: registry
labels:
app: registry
namespace: docker-registry
spec:
containers:
- name: registry
image: registry:2
imagePullPolicy: Always
ports:
- containerPort: 5000
volumeMounts:
- mountPath: /var/lib/registry
name: registry-data
volumes:
- hostPath:
path: /var/lib/kubelet/plugins/csi-registry
type: DirectoryOrCreate
name: registry-data
I attempt to create it using kubectl apply -f registry-setup.yaml. Before running this my minikube cluster reports itself as ready and with all the normal minikube containers running.
However, this fails to run and upon running kubectl describe pod, I see the following message:
Name: registry
Namespace: docker-registry
Node: minikube/192.168.64.43
Start Time: Wed, 08 Aug 2018 12:24:27 -0700
Labels: app=registry
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"registry"},"name":"registry","namespace":"docker-registry"},"spec":{"cont...
Status: Running
IP: 172.17.0.2
Containers:
registry:
Container ID: docker://42e5193ac563c2b2e2a2b381c91350d30f7e7c5009a30a5977d33b403a374e7f
Image: registry:2
...
TRUNCATED FOR SPACE
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned registry to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "registry-data"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-kq5mq"
Normal Pulling 1m kubelet, minikube pulling image "registry:2"
Normal Pulled 1m kubelet, minikube Successfully pulled image "registry:2"
Normal Created 1m kubelet, minikube Created container
Normal Started 1m kubelet, minikube Started container
...
TRUNCATED
...
Name: storage-provisioner
Namespace: kube-system
Node: minikube/192.168.64.43
Start Time: Wed, 08 Aug 2018 12:24:38 -0700
Labels: addonmanager.kubernetes.io/mode=Reconcile
integration-test=storage-provisioner
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","integration-test":"storage-provis...
Status: Pending
IP: 192.168.64.43
Containers:
storage-provisioner:
Container ID:
Image: gcr.io/k8s-minikube/storage-provisioner:v1.8.1
Image ID:
Port: <none>
Host Port: <none>
Command:
/storage-provisioner
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from storage-provisioner-token-sb5hz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
tmp:
Type: HostPath (bare host directory volume)
Path: /tmp
HostPathType: Directory
storage-provisioner-token-sb5hz:
Type: Secret (a volume populated by a Secret)
SecretName: storage-provisioner-token-sb5hz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned storage-provisioner to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "tmp"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "storage-provisioner-token-sb5hz"
Normal Pulling 23s (x3 over 1m) kubelet, minikube pulling image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1"
Warning Failed 21s (x3 over 1m) kubelet, minikube Failed to pull image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): write /storage-provisioner: no space left on device
Warning Failed 21s (x3 over 1m) kubelet, minikube Error: ErrImagePull
Normal BackOff 7s (x3 over 1m) kubelet, minikube Back-off pulling image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1"
Warning Failed 7s (x3 over 1m) kubelet, minikube Error: ImagePullBackOff
------------------------------------------------------------
...
So while the registry container starts up correctly, a few of the other minikube services (including dns, http ingress service, etc) begin to fail with reasons such as the following: write /storage-provisioner: no space left on device. Despite allocating a 40GB disk-size to minikube, it seems as though minikube is trying to write to rootfs or devtempfs (depending on the vm-driver) which has only 1GB of space.
$ df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 919M 713M 206M 78% /
devtmpfs 919M 0 919M 0% /dev
tmpfs 996M 0 996M 0% /dev/shm
tmpfs 996M 8.9M 987M 1% /run
tmpfs 996M 0 996M 0% /sys/fs/cgroup
tmpfs 996M 8.0K 996M 1% /tmp
/dev/sda1 34G 1.3G 30G 4% /mnt/sda1
Is there a way to make minikube actually use the 34GB of space that was allocated to /mnt/sda1 instead of rootfs when pulling images and creating containers?
Thanks in advance for any help!
You need to configure your Minikube virtual machine for using /dev/sda1 instead of / for Docker. To log in to it, use minikube ssh command.
Than you have two options:
Mount /dev/sda1 to var/lib/docker, but don't forget to copy the content from original var/lib/docker to /mnt/sda1 before that.
Reconfigure Docker for using /mnt/sda1 instead of var/lib/docker for storing images. Look through this link for more information about it.
You can also use the minikube --docker-opt option to set the --data-root option of the dockerd daemon running inside minikube. --docker-opt can be used as a pass-through for any parameter to dockerd.
For example, in the case you describe above it would look like:
minikube start --vm-driver=hyperkit --disk-size=40g --docker-opt="--data-root /mnt/sda1"
Keep in mind that if you try to modify an existing minikube cluster you either have to copy var/lib/docker to /mnt/sda1 (as the previous answer also suggested) before restarting or delete and rebuild the cluster.
update:
After experimentation, I noticed that the above solution will not work the first time you run minikube start as it somehow interferes with minikube's own core-system build and boot-up process.
In practice this means that you need to run minikube start at least once without the --docker-opt to build the core system and then re-run it with --docker-opt.

MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32

This is 2nd question following 1st question at
PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
I am setting up a kubernetes lab using one node only and learning to setup kubernetes nfs. I am following kubernetes nfs example step by step from the following link: https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
Based on feedback provided by 'helmbert', I modified the content of
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/provisioner/nfs-server-gce-pv.yaml
It works and I don't see the event "PersistentVolumeClaim is not bound: “nfs-pv-provisioning-demo”" anymore.
$ cat nfs-server-local-pv01.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv01
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/data01"
$ cat nfs-server-local-pvc01.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pv-provisioning-demo
labels:
demo: nfs-pv-provisioning
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 10Gi RWO Retain Available 4s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pv-provisioning-demo Bound pv01 10Gi RWO 2m
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl describe pods nfs-server-nlzlv
Name: nfs-server-nlzlv
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 19:32:21 +0000
Labels: role=nfs-server
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-server","uid":"b1b00292-cef2-11e7-8ed3-000d3a04eb...
Status: Running
IP: 10.32.0.3
Created By: ReplicationController/nfs-server
Controlled By: ReplicationController/nfs-server
Containers:
nfs-server:
Container ID: docker://1ea76052920d4560557cfb5e5bfc9f8efc3af5f13c086530bd4e0aded201955a
Image: gcr.io/google_containers/volume-nfs:0.8
Image ID: docker-pullable://gcr.io/google_containers/volume-nfs#sha256:83ba87be13a6f74361601c8614527e186ca67f49091e2d0d4ae8a8da67c403ee
Ports: 2049/TCP, 20048/TCP, 111/TCP
State: Running
Started: Tue, 21 Nov 2017 19:32:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/exports from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
I continued the rest of steps and reached the "Setup the fake backend" section and ran the following command:
$ kubectl create -f examples/volumes/nfs/nfs-busybox-rc.yaml
I see status 'ContainerCreating' and never change to 'Running' for both nfs-busybox pods. Is this because the container image is for Google Cloud as shown in the yaml?
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/nfs-server-rc.yaml
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
Do I have to replace that 'image' line to something else because I don't use Google Cloud for this lab? I only have a single node in my lab. Do I have to rewrite the definition of 'containers' above? What should I replace the 'image' line with? Do I need to download dockerized 'nfs image' from somewhere?
$ kubectl describe pvc nfs
Name: nfs
Namespace: default
StorageClass:
Status: Bound
Volume: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 1Mi
Access Modes: RWX
Events: <none>
$ kubectl describe pv nfs
Name: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/nfs
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 1Mi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.111.29.157
Path: /
ReadOnly: false
Events: <none>
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
nfs-busybox 2 2 0 25s
nfs-server 1 1 1 1h
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-busybox-lmgtx 0/1 ContainerCreating 0 3m
nfs-busybox-xn9vz 0/1 ContainerCreating 0 3m
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20m
nfs-server ClusterIP 10.111.29.157 <none> 2049/TCP,20048/TCP,111/TCP 9s
$ kubectl describe services nfs-server
Name: nfs-server
Namespace: default
Labels: <none>
Annotations: <none>
Selector: role=nfs-server
Type: ClusterIP
IP: 10.111.29.157
Port: nfs 2049/TCP
TargetPort: 2049/TCP
Endpoints: 10.32.0.3:2049
Port: mountd 20048/TCP
TargetPort: 20048/TCP
Endpoints: 10.32.0.3:20048
Port: rpcbind 111/TCP
TargetPort: 111/TCP
Endpoints: 10.32.0.3:111
Session Affinity: None
Events: <none>
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs 1Mi RWX Retain Bound default/nfs 38m
pv01 10Gi RWO Retain Bound default/nfs-pv-provisioning-demo 1h
I see repeating events - MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
$ kubectl describe pod nfs-busybox-lmgtx
Name: nfs-busybox-lmgtx
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned nfs-busybox-lmgtx to lab-kube-06
Normal SuccessfulMountVolume 17m kubelet, lab-kube-06 MountVolume.SetUp succeeded for volume "default-token-grgdz"
Warning FailedMount 17m kubelet, lab-kube-06 MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-43641.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedMount 9m (x4 over 15m) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 4m (x8 over 15m) kubelet, lab-kube-06 (combined from similar events): Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedSync 2m (x7 over 15m) kubelet, lab-kube-06 Error syncing pod
$ kubectl describe pod nfs-busybox-xn9vz
Name: nfs-busybox-xn9vz
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 59m (x6 over 1h) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-xn9vz_default(15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-xn9vz". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 7m (x32 over 1h) kubelet, lab-kube-06 (combined from similar events): MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-59365.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedSync 2m (x31 over 1h) kubelet, lab-kube-06 Error syncing pod
Had the same problem,
sudo apt install nfs-kernel-server
directly on the nodes fixed it for ubuntu 18.04 server.
NFS server running on AWS EC2.
My pod was stuck in ContainerCreating state
I was facing this issue because of the Kubernetes cluster node CIDR range was not present in the inbound rule of Security Group of my AWS EC2 instance(where my NFS server was running )
Solution:
Added my Kubernetes cluser Node CIDR range to inbound rule of Security Group.
Installed the following nfs libraries on node machine of CentOS worked for me.
yum install -y nfs-utils nfs-utils-lib
Installing the nfs-common library in ubuntu worked for me.