Kubernetes pull from local registry not working, how to fix that? - kubernetes

I followed this link and created a secret as below.
kubectl create secret docker-registry regsec --docker-server=192.168.56.106:5000 --docker-username=osboxes --docker-password=osboxes.org --insecure-skip-tls-verify=true
And the deployment as below.
kubectl create deploy nginx1 --image 192.168.56.106:5000/todoapp:1.0
And edited it using
kubectl edit deploy nginx1
And added imagePullSecrets to it
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2021-07-21T10:23:23Z"
generation: 2
labels:
app: nginx1
name: nginx1
namespace: default
resourceVersion: "6872"
uid: 0b6917f0-10ac-4206-82a8-c49ae8ffa2b3
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx1
spec:
containers:
- image: 192.168.56.106:5000/todoapp:1.0
imagePullPolicy: IfNotPresent
name: todoapp
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regsec
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2021-07-21T10:23:23Z"
lastUpdateTime: "2021-07-21T10:23:23Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2021-07-21T10:23:23Z"
lastUpdateTime: "2021-07-21T10:28:36Z"
message: ReplicaSet "nginx1-75df7fd466" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 2
replicas: 2
unavailableReplicas: 2
updatedReplicas: 1
But still getting below error.
osboxes#osboxes:~/Desktop$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-vvzjf 1/1 Running 0 19m
nginx1-65d848d94f-dd4ck 0/1 ImagePullBackOff 0 12m
nginx1-75df7fd466-kn5mf 0/1 ImagePullBackOff 0 6m50s
osboxes#osboxes:~/Desktop$ kubectl describe pod nginx1-75df7fd466-kn5mf
Name: nginx1-75df7fd466-kn5mf
Namespace: default
Priority: 0
Node: samples-control-plane/172.19.0.3
Start Time: Wed, 21 Jul 2021 06:28:36 -0400
Labels: app=nginx1
pod-template-hash=75df7fd466
Annotations: <none>
Status: Pending
IP: 10.244.0.8
IPs:
IP: 10.244.0.8
Controlled By: ReplicaSet/nginx1-75df7fd466
Containers:
todoapp:
Container ID:
Image: 192.168.56.106:5000/todoapp:1.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-88clq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-88clq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 73s default-scheduler Successfully assigned default/nginx1-75df7fd466-kn5mf to samples-control-plane
Normal Pulling 33s (x3 over 73s) kubelet Pulling image "192.168.56.106:5000/todoapp:1.0"
Warning Failed 33s (x3 over 73s) kubelet Failed to pull image "192.168.56.106:5000/todoapp:1.0": rpc error: code = Unknown desc = failed to pull and unpack image "192.168.56.106:5000/todoapp:1.0": failed to resolve reference "192.168.56.106:5000/todoapp:1.0": failed to do request: Head "https://192.168.56.106:5000/v2/todoapp/manifests/1.0": http: server gave HTTP response to HTTPS client
Warning Failed 33s (x3 over 73s) kubelet Error: ErrImagePull
Normal BackOff 6s (x4 over 72s) kubelet Back-off pulling image "192.168.56.106:5000/todoapp:1.0"
Warning Failed 6s (x4 over 72s) kubelet Error: ImagePullBackOff
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-13T02:40:46Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-21T23:01:33Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Adding docker insecure registry is not useful as the container runtime is not docker now.It is containerd.
/etc/default/docker.json
And for reference, I have setup the local container on a virtualbox VM and connecting to kubernetes on the same network using Host-only network ip address.
The setup of local registry is created using the steps mentioned in link.
And the kubernetes was setup using kind.

cri-o
If you are using cri-o as Container Runtime Provider, docker settings are of no use. You would require to configure cri-o instead: https://github.com/cri-o/cri-o#configuration
Assuming you have cri-o installed on the host node, the official documentation recommends to have a $HOME/.config/containers/registries.conf file, or a global, /etc/containers/registries.conf file to configure registries in a runtime engine agnostic way.
The file structure spec is also documented: https://github.com/containers/image/blob/main/docs/containers-registries.conf.5.md
Example:
[registries.search]
registries = ['registry1.com', 'registry2.com']
[registries.insecure]
registries = ['registry3.com']
[registries.block]
registries = ['registry.untrusted.com', 'registry.unsafe.com']
containerd
Containerd does not seem to acknowledge the /etc/containers/registries.conf settings.
As per the documentation, The main config file /etc/containerd/config.toml can specify a registry config path as:
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
And inside the path specified above (/etc/containerd/certs.d) create a directory for host (docker.io) and in it, create a hosts.toml file. That is /etc/containerd/certs.d/docker.io/hosts.toml:
server = "https://registry-1.docker.io"
[host."http://my-custom-registry:5000"]
capabilities = ["pull", "resolve", "push"]
skip_verify = true
plain-http = true
For my-custom-registry:5000 section you can also provide credentials as well as certificates.
And restart containerd daemon/service on the host.
Configuration file spec is here: https://github.com/containerd/containerd/blob/main/docs/hosts.md

Related

Unable to attach or mount volumes: unmounted volumes=[html], unattached volumes=[html xxxxx]: timed out waiting for the condition

Hi i'm facing this issue when my pod using a gitRepo volume, from the book Kubernetes In action example
here is my deployment file
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitrepo-vol-deploy
namespace: book
spec:
replicas: 1
selector:
matchLabels:
app: git-repo-as-vol-1
type: volumes
template:
metadata:
labels:
app: git-repo-as-vol-1
type: volumes
spec:
containers:
- image: nginx
name: web-server
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
readOnly: true
ports:
- containerPort: 80
protocol: TCP
volumes:
- name: html
gitRepo:
repository: git#github.com:xxxxxx/kubia-website-example.git
revision: master
directory: .
My pods status
k8s#GSG1PM-FT1057:~$ n book
NAME READY STATUS RESTARTS AGE
deploy-kubia-app-7d6fb9cb8-rvrrv 1/1 Running 4 (14h ago) 15d
gitrepo-vol-deploy-b59f686cf-hzlts 0/1 ContainerCreating 0 30m
rs-kubia-app-6jjpm 1/1 Running 4 (14h ago) 15d
vol-1-7956876664-5f48q 2/2 Running 0 93m
Here is my pod description
Containers:
web-server:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/nginx/html from html (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bln48 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
html:
Type: GitRepo (a volume that is pulled from git when the pod is created)
Repository: git#github.com:xxxxxx/kubia-website-example.git
Revision: master
kube-api-access-bln48:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned book/gitrepo-vol-deploy-b59f686cf-hzlts to minikube
Warning FailedMount 7m12s kubelet Unable to attach or mount volumes: unmounted volumes=[html], unattached volumes=[kube-api-access-bln48 html]: timed out waiting for the condition
Warning FailedMount 74s (x13 over 11m) kubelet MountVolume.SetUp failed for volume "html" : failed to exec 'git clone -- git#github.com:xxxxx/kubia-website-example.git .': : executable file not found in $PATH
Warning FailedMount 22s (x4 over 9m27s) kubelet Unable to attach or mount volumes: unmounted volumes=[html], unattached volumes=[html kube-api-access-bln48]: timed out waiting for the condition
in the pod description events it stating this, can anyone help me where i'm wrong? Thanks in advance.
Warning FailedMount 7m12s kubelet Unable to attach or mount volumes: unmounted volumes=[html], unattached volumes=[kube-api-access-bln48 html]: timed out waiting for the condition
Using a Git repository as the starting point for a volume.
It seems the root cause is actually written in the Events. When doing git clone of your repo it says:
"executable file not found in $PATH"
The nginx image you are using doesn't contain the git executable, so you could instead create your own Docker image based on nginx:
FROM nginx
RUN apt update && apt install -y git

CrashLoopBackOff : Back-off restarting failed container for flask application

I am a beginner in kubernetes and was trying to deploy my flask application following this guide: https://medium.com/analytics-vidhya/build-a-python-flask-app-and-deploy-with-kubernetes-ccc99bbec5dc
I have successfully built a docker image and pushed it to dockerhub https://hub.docker.com/repository/docker/beatrix1997/kubernetes_flask_app
but am having trouble debugging a pod.
This is my yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetesflaskapp-deploy
labels:
app: kubernetesflaskapp
spec:
replicas: 1
selector:
matchLabels:
app: kubernetesflaskapp
template:
metadata:
labels:
app: kubernetesflaskapp
spec:
containers:
- name: kubernetesflaskapp
image: beatrix1997/kubernetes_flask_app
ports:
- containerPort: 5000
And this is the description of the pod:
Name: kubernetesflaskapp-deploy-5764bbbd44-8696k
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Fri, 20 May 2022 11:26:33 +0100
Labels: app=kubernetesflaskapp
pod-template-hash=5764bbbd44
Annotations: <none>
Status: Running
IP: 172.17.0.12
IPs:
IP: 172.17.0.12
Controlled By: ReplicaSet/kubernetesflaskapp-deploy-5764bbbd44
Containers:
kubernetesflaskapp:
Container ID: docker://d500dc15e389190670a9273fea1d70e6bd6ab2e7053bd2480d114ad6150830f1
Image: beatrix1997/kubernetes_flask_app
Image ID: docker-pullable://beatrix1997/kubernetes_flask_app#sha256:1bfa98229f55b04f32a6b85d72860886abcc0f17295b14e173151a8e4b0f0334
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 20 May 2022 11:58:38 +0100
Finished: Fri, 20 May 2022 11:58:38 +0100
Ready: False
Restart Count: 11
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zq8n7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-zq8n7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned default/kubernetesflaskapp-deploy-5764bbbd44-8696k to minikube
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 14.783413947s
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.243534487s
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.373217701s
Normal Pulling 32m (x4 over 33m) kubelet Pulling image "beatrix1997/kubernetes_flask_app"
Normal Created 32m (x4 over 33m) kubelet Created container kubernetesflaskapp
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.239794774s
Normal Started 32m (x4 over 33m) kubelet Started container kubernetesflaskapp
Warning BackOff 3m16s (x138 over 33m) kubelet Back-off restarting failed container
I am using ubuntu as my OS if it matters at all.
Any help would be appreciated!
Many thanks!
I would check the following:
Check if your Docker image works in Docker, you can run it with the run command, find the official doc here
If it doesn't work, then you can check what is wrong in your app first.
If it does, try checking the readiness and liveness probe, here the official documentation
You can find more hints about failing pods here
The error can be due to the issue in the application as the reported reason is "Back-off restarting failed container". Please paste the following logs in the question for further clarification
kubectl logs -n <NS> pods <pod-name>

K8s tutorial fails on my local installation with i/o timeout

I'm working on a local kubernetes installation with three nodes. They are installed via geerlingguy/kubernetes Ansible role (with default settings). I've recreated the whole VMs multiple times. I try to follow the Kubernetes tutorials on https://kubernetes.io/docs/tutorials/kubernetes-basics/explore/explore-interactive/ to get services up and running inside the cluster and try to reach them now.
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
enceladus Ready <none> 162m v1.17.9
mimas Ready <none> 162m v1.17.9
titan Ready master 162m v1.17.9
I tried it with the 1.17.9 or 1.18.6, I tried it with https://github.com/geerlingguy/ansible-role-kubernetes and https://github.com/kubernetes-sigs/kubespray on fresh Debian-Buster VMs. I tried it with Flannel and Calico network plugin. There is no a firewall configured.
I can deploy the kubernetes-bootcamp and exec into it, but when I try to reach the pod via kubectl proxy and curl I'm getting an error.
# kubectl create deployment kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1
# kubectl describe pods
Name: kubernetes-bootcamp-69fbc6f4cf-nq4tj
Namespace: default
Priority: 0
Node: enceladus/192.168.10.12
Start Time: Thu, 06 Aug 2020 10:53:34 +0200
Labels: app=kubernetes-bootcamp
pod-template-hash=69fbc6f4cf
Annotations: <none>
Status: Running
IP: 10.244.1.4
IPs:
IP: 10.244.1.4
Controlled By: ReplicaSet/kubernetes-bootcamp-69fbc6f4cf
Containers:
kubernetes-bootcamp:
Container ID: docker://77eae93ca1e6b574ef7b0623844374a5b2f3054075025492b708b23fc3474a45
Image: gcr.io/google-samples/kubernetes-bootcamp:v1
Image ID: docker-pullable://gcr.io/google-samples/kubernetes-bootcamp#sha256:0d6b8ee63bb57c5f5b6156f446b3bc3b3c143d233037f3a2f00e279c8fcc64af
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 06 Aug 2020 10:53:35 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kkcvk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-kkcvk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kkcvk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10s default-scheduler Successfully assigned default/kubernetes-bootcamp-69fbc6f4cf-nq4tj to enceladus
Normal Pulled 9s kubelet, enceladus Container image "gcr.io/google-samples/kubernetes-bootcamp:v1" already present on machine
Normal Created 9s kubelet, enceladus Created container kubernetes-bootcamp
Normal Started 9s kubelet, enceladus Started container kubernetes-bootcamp
Update service list
# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d20h
I can exec curl inside the deployment. It is running.
# kubectl exec -ti kubernetes-bootcamp-69fbc6f4cf-nq4tj curl http://localhost:8080/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-69fbc6f4cf-nq4tj | v=1
But, when I try to curl from master node the response is not good:
curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-69fbc6f4cf-nq4tj/proxy/
Error trying to reach service: 'dial tcp 10.244.1.4:80: i/o timeout'
The curl itself needs ca. 30sec to return. The version etc. is available. The proxy is running fine.
# curl http://localhost:8001/version
{
"major": "1",
"minor": "17",
"gitVersion": "v1.17.9",
"gitCommit": "4fb7ed12476d57b8437ada90b4f93b17ffaeed99",
"gitTreeState": "clean",
"buildDate": "2020-07-15T16:10:45Z",
"goVersion": "go1.13.9",
"compiler": "gc",
"platform": "linux/amd64"
}
The tutorial shows on kubectl describe pods that the container has open ports (in my case it's <none>):
Port: 8080/TCP
Host Port: 0/TCP
Ok, I than created an apply-file bootcamp.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetes-bootcamp
spec:
replicas: 1
selector:
matchLabels:
app: kubernetes-bootcamp
template:
metadata:
labels:
app: kubernetes-bootcamp
spec:
containers:
- name: kubernetes-bootcamp
image: gcr.io/google-samples/kubernetes-bootcamp:v1
ports:
- containerPort: 8080
protocol: TCP
I removed the previous deployment
# kubectl delete deployments.apps kubernetes-bootcamp --force
# kubectl apply -f bootcamp.yaml
But after that I'm getting still the same i/o timeout on the new deployment.
So, what is my problem?

Kubernetes master doesn't attach FlexVolume

I'm trying to attach the dummy-attachable FlexVolume sample for Kubernetes which seems to initialize normally according to my logs on both the nodes and master:
Loaded volume plugin "flexvolume-k8s/dummy-attachable
But when I try to attach the volume to a pod, the attach method never gets called from the master. The logs from the node read:
flexVolume driver k8s/dummy-attachable: using default GetVolumeName for volume dummy-attachable
operationExecutor.VerifyControllerAttachedVolume started for volume "dummy-attachable"
Operation for "\"flexvolume-k8s/dummy-attachable/dummy-attachable\"" failed. No retries permitted until 2019-04-22 13:42:51.21390334 +0000 UTC m=+4814.674525788 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"dummy-attachable\" (UniqueName: \"flexvolume-k8s/dummy-attachable/dummy-attachable\") pod \"nginx-dummy-attachable\"
Here's how I'm attempting to mount the volume:
apiVersion: v1
kind: Pod
metadata:
name: nginx-dummy-attachable
namespace: default
spec:
containers:
- name: nginx-dummy-attachable
image: nginx
volumeMounts:
- name: dummy-attachable
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: dummy-attachable
flexVolume:
driver: "k8s/dummy-attachable"
Here is the ouput of kubectl describe pod nginx-dummy-attachable:
Name: nginx-dummy-attachable
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: [node id]
Start Time: Wed, 24 Apr 2019 08:03:21 -0400
Labels: <none>
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container nginx-dummy-attachable
Status: Pending
IP:
Containers:
nginx-dummy-attachable:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/data from dummy-attachable (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hcnhj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
dummy-attachable:
Type: FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
Driver: k8s/dummy-attachable
FSType:
SecretRef: nil
ReadOnly: false
Options: map[]
default-token-hcnhj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hcnhj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 41s (x6 over 11m) kubelet, [node id] Unable to mount volumes for pod "nginx-dummy-attachable_default([id])": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx-dummy-attachable". list of unmounted volumes=[dummy-attachable]. list of unattached volumes=[dummy-attachable default-token-hcnhj]
I added debug logging to the FlexVolume, so I was able to verify that the attach method was never called on the master node. I'm not sure what I'm missing here.
I don't know if this matters, but the cluster is being launched with KOPS. I've tried with both k8s 1.11 and 1.14 with no success.
So this is a fun one.
Even though kubelet initializes the FlexVolume plugin on master, kube-controller-manager, which is containerized in KOPs, is the application that's actually responsible for attaching the volume to the pod. KOPs doesn't mount the default plugin directory /usr/libexec/kubernetes/kubelet-plugins/volume/exec into the kube-controller-manager pod, so it doesn't know anything about your FlexVolume plugins on master.
There doesn't appear to be a non-hacky way to do this other than to use a different Kubernetes deployment tool until KOPs addresses this problem.

MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32

This is 2nd question following 1st question at
PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
I am setting up a kubernetes lab using one node only and learning to setup kubernetes nfs. I am following kubernetes nfs example step by step from the following link: https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
Based on feedback provided by 'helmbert', I modified the content of
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/provisioner/nfs-server-gce-pv.yaml
It works and I don't see the event "PersistentVolumeClaim is not bound: “nfs-pv-provisioning-demo”" anymore.
$ cat nfs-server-local-pv01.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv01
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/data01"
$ cat nfs-server-local-pvc01.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pv-provisioning-demo
labels:
demo: nfs-pv-provisioning
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 10Gi RWO Retain Available 4s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pv-provisioning-demo Bound pv01 10Gi RWO 2m
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl describe pods nfs-server-nlzlv
Name: nfs-server-nlzlv
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 19:32:21 +0000
Labels: role=nfs-server
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-server","uid":"b1b00292-cef2-11e7-8ed3-000d3a04eb...
Status: Running
IP: 10.32.0.3
Created By: ReplicationController/nfs-server
Controlled By: ReplicationController/nfs-server
Containers:
nfs-server:
Container ID: docker://1ea76052920d4560557cfb5e5bfc9f8efc3af5f13c086530bd4e0aded201955a
Image: gcr.io/google_containers/volume-nfs:0.8
Image ID: docker-pullable://gcr.io/google_containers/volume-nfs#sha256:83ba87be13a6f74361601c8614527e186ca67f49091e2d0d4ae8a8da67c403ee
Ports: 2049/TCP, 20048/TCP, 111/TCP
State: Running
Started: Tue, 21 Nov 2017 19:32:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/exports from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
I continued the rest of steps and reached the "Setup the fake backend" section and ran the following command:
$ kubectl create -f examples/volumes/nfs/nfs-busybox-rc.yaml
I see status 'ContainerCreating' and never change to 'Running' for both nfs-busybox pods. Is this because the container image is for Google Cloud as shown in the yaml?
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/nfs-server-rc.yaml
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
Do I have to replace that 'image' line to something else because I don't use Google Cloud for this lab? I only have a single node in my lab. Do I have to rewrite the definition of 'containers' above? What should I replace the 'image' line with? Do I need to download dockerized 'nfs image' from somewhere?
$ kubectl describe pvc nfs
Name: nfs
Namespace: default
StorageClass:
Status: Bound
Volume: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 1Mi
Access Modes: RWX
Events: <none>
$ kubectl describe pv nfs
Name: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/nfs
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 1Mi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.111.29.157
Path: /
ReadOnly: false
Events: <none>
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
nfs-busybox 2 2 0 25s
nfs-server 1 1 1 1h
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-busybox-lmgtx 0/1 ContainerCreating 0 3m
nfs-busybox-xn9vz 0/1 ContainerCreating 0 3m
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20m
nfs-server ClusterIP 10.111.29.157 <none> 2049/TCP,20048/TCP,111/TCP 9s
$ kubectl describe services nfs-server
Name: nfs-server
Namespace: default
Labels: <none>
Annotations: <none>
Selector: role=nfs-server
Type: ClusterIP
IP: 10.111.29.157
Port: nfs 2049/TCP
TargetPort: 2049/TCP
Endpoints: 10.32.0.3:2049
Port: mountd 20048/TCP
TargetPort: 20048/TCP
Endpoints: 10.32.0.3:20048
Port: rpcbind 111/TCP
TargetPort: 111/TCP
Endpoints: 10.32.0.3:111
Session Affinity: None
Events: <none>
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs 1Mi RWX Retain Bound default/nfs 38m
pv01 10Gi RWO Retain Bound default/nfs-pv-provisioning-demo 1h
I see repeating events - MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
$ kubectl describe pod nfs-busybox-lmgtx
Name: nfs-busybox-lmgtx
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned nfs-busybox-lmgtx to lab-kube-06
Normal SuccessfulMountVolume 17m kubelet, lab-kube-06 MountVolume.SetUp succeeded for volume "default-token-grgdz"
Warning FailedMount 17m kubelet, lab-kube-06 MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-43641.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedMount 9m (x4 over 15m) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 4m (x8 over 15m) kubelet, lab-kube-06 (combined from similar events): Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedSync 2m (x7 over 15m) kubelet, lab-kube-06 Error syncing pod
$ kubectl describe pod nfs-busybox-xn9vz
Name: nfs-busybox-xn9vz
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 59m (x6 over 1h) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-xn9vz_default(15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-xn9vz". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 7m (x32 over 1h) kubelet, lab-kube-06 (combined from similar events): MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-59365.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedSync 2m (x31 over 1h) kubelet, lab-kube-06 Error syncing pod
Had the same problem,
sudo apt install nfs-kernel-server
directly on the nodes fixed it for ubuntu 18.04 server.
NFS server running on AWS EC2.
My pod was stuck in ContainerCreating state
I was facing this issue because of the Kubernetes cluster node CIDR range was not present in the inbound rule of Security Group of my AWS EC2 instance(where my NFS server was running )
Solution:
Added my Kubernetes cluser Node CIDR range to inbound rule of Security Group.
Installed the following nfs libraries on node machine of CentOS worked for me.
yum install -y nfs-utils nfs-utils-lib
Installing the nfs-common library in ubuntu worked for me.