Kubernetes cannot mount Glusterfs volumes - kubernetes

I have a GlusterFS cluster and a k8s cluster. On any of the nodes in the k8s cluster, I can execute the following command:
mount -t glusterfs data1:/sbnstore /mnt/data/ -o xlator-option="transport.address-family=inet6"
and this works perfectly. I can create and access files under /mnt/data and they appear in the GlusterFS cluster and replicate among the peers. Note that the GlusterFS peers do not have IPv4 addresses. Only IPv6.
However creating a Kubernetes Pod hangs in the container creation stage. The following is the manifest:
apiVersion: v1
kind: Pod
metadata:
name: shell-testg
labels:
alan: testg
spec:
containers:
- name: ubuntu
image: ubuntu:latest
command: ["/bin/sleep", "3650d"]
volumeMounts:
- name: glusterfs-volume
mountPath: /data
volumes:
- name: glusterfs-volume
glusterfs:
endpoints: glusterfs-cluster
path: sbnstore
readOnly: no
When this manifest is applied, the pod that is created is permanently stuck in ContainerCreating status.
$ kc get pod
NAME READY STATUS RESTARTS AGE
shell-testg 0/1 ContainerCreating 0 12m
Digging for more info (extraneous lines omitted):
$ kc describe pod shell-testg
Name: shell-testg
Namespace: default
...
Mounts:
/data from glusterfs-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kxktp (ro)
...
Volumes:
glusterfs-volume:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: glusterfs-cluster
Path: sbnstore
ReadOnly: false
kube-api-access-kxktp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 109s (x6 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[glusterfs-volume], unattached volumes=[glusterfs-volume kube-api-access-kxktp]: timed out waiting for the condition
Warning FailedMount 84s kubelet MountVolume.SetUp failed for volume "glusterfs-volume" : mount failed: mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2ef51502-7327-443c-abc1-2a0f1ee1b177/volumes/kubernetes.io~glusterfs/glusterfs-volume --scope -- mount -t glusterfs -o auto_unmount,backup-volfile-servers=2001:470:1:999:1:0:30:100:2001:470:1:999:1:0:30:200:2001:470:1:999:1:0:30:300,log-file=/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/glusterfs/glusterfs-volume/shell-testg-glusterfs.log,log-level=ERROR 2001:470:1:841:1:0:30:200:sbnstore /var/snap/microk8s/common/var/lib/kubelet/pods/2ef51502-7327-443c-abc1-2a0f1ee1b177/volumes/kubernetes.io~glusterfs/glusterfs-volume
Output: Running scope as unit: run-rd7adcd2329cb4ab2a694d02925df1a2f.scope
[2022-06-23 18:32:35.342172] E [glusterfsd.c:833:gf_remember_backup_volfile_server] 0-glusterfs: failed to set volfile server: File exists
...
22-06-23 18:32:35.342443] E [glusterfsd.c:833:gf_remember_backup_volfile_server] 0-glusterfs: failed to set volfile server: File exists
Mounting glusterfs on /var/snap/microk8s/common/var/lib/kubelet/pods/2ef51502-7327-443c-abc1-2a0f1ee1b177/volumes/kubernetes.io~glusterfs/glusterfs-volume failed.
So now I am kind of stuck. Poking through the github issues board this kind of error seems to appear when there is a version incompatibility. My k8s is 1.23 and gluster is 7.2.
Any suggestions of what to try next?
UPDATE
As it turns out, the issue is that the glusterfs module within Kubernetes is not able to use IPv6. So I have a workaround which is to convert all the Endpoints to IPv4 addresses.
Not working:
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 2001:470:1:999:1:0:30:100
ports:
- port: 49152
- addresses:
- ip: 2001:470:1:999:1:0:30:200
ports:
- port: 49152
- addresses:
- ip: 2001:470:1:999:1:0:30:300
ports:
- port: 49152
Working:
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 10.2.12.1
ports:
- port: 49152
- addresses:
- ip: 10.2.12.2
ports:
- port: 49152
- addresses:
- ip: 10.2.12.3
ports:
- port: 49152
It is interesting to note that automatically provisioned PV/PVCs using the Heketi package do work over IPv6.
I haven't been able to find any way to specify configuration parameters to the glusterfs module that might provide a real solution. So if anyone has any pointers to that please provide it in an answer to this posting.
Hopefully this update will help someone else avoid this learning curve.

Related

ImagePulloff error while deploying on minikube

Hey there so I was trying to deploy my first and simple webapp with no database on minikube but this Imagepulloff error keeps coming in the pod.
Yes I have checked the name of Image,tag several times;
Here are the logs and yml files.
Namespace: default
Priority: 0
Service Account: default
Labels: app=nodeapp1
pod-template-hash=589c6bd468
Annotations: <none>
Status: Pending
Controlled By: ReplicaSet/nodeapp1-deployment-589c6bd468
Containers:
nodeserver:
Container ID:
Image: ayushftw/nodeapp1:latest
Image ID:
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k6mkb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-k6mkb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapOptional: <nil>
DownwardAPI: true
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m3s default-scheduler Successfully assigned default/nodeapp1-deployment-589c6bd468-5lg2n to minikube
Normal Pulling 2m2s kubelet Pulling image "ayushftw/nodeapp1:latest"
Warning Failed 3s kubelet Failed to pull image "ayushftw/nodeapp1:latest": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 3s kubelet Error: ErrImagePull
Normal BackOff 2s kubelet Back-off pulling image "ayushftw/nodeapp1:latest"
Warning Failed 2s kubelet Error: ImagePullBackOff
deployment.yml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: nodeapp1-deployment
labels:
app: nodeapp1
spec:
replicas: 1
selector:
matchLabels:
app: nodeapp1
template:
metadata:
labels:
app: nodeapp1
spec:
containers:
- name: nodeserver
image: ayushftw/nodeapp1:latest
ports:
- containerPort: 3000
service.yml fie
apiVersion: v1
kind: Service
metadata:
name: nodeapp1-service
spec:
selector:
app: nodeapp1
type: LoadBalancer
ports:
- protocol: TCP
port: 3000
targetPort: 3000
nodePort: 31011
Please Help If anybody knows anything about this .
I think your internet connection is slow. The timeout to pull an image is 120 seconds, so kubectl could not pull the image in under 120 seconds.
First, pull the image via Docker
docker image pull ayushftw/nodeapp1:latest
Then load the downloaded image to minikube
minikube image load ayushftw/nodeapp1:latest
And then everything will work because now kubectl will use the image that is stored locally.
It seems to be an issue with the ability to reach container registry or registry in use for your images. Can you try to pull the image manually from the node?

Kubernetes hostPath volume always freezes pod

I have this pod definition file using the basic nginx container image. All I am doing in this POD is to attempt to mount a local directory so that it can be accessed by the pod.
apiVersion: v1
kind: Pod
metadata:
name: empty-pod
labels:
name: empty-pod
spec:
containers:
- name: empty
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: db-persistence
mountPath: /data/db
volumes:
- name: db-persistence
hostPath:
path: /c/MongoData/
type: Directory
I have two different minikube environments, both Windows machines, one using Docker Desktop and the other VirtualBox. Using the definition above, and attempt to create the pod gives a pod that never actually starts:
d:\Kubernetes\exercise>kubectl get all
NAME READY STATUS RESTARTS AGE
pod/empty-pod 0/1 ContainerCreating 0 8m30s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d7h
The folder it is mounting is empty at this point. I have also tried with files in there. It just seems to freeze up. Deleting the pod takes a long time (several minutes) as well. As far as I can tell, this is the textbook example of how to mount a file system from the host into the pod/container. Any idea what I am doing wrong?
UPDATE: describe on the pod gives:
Name: empty-pod
Namespace: default
Priority: 0
Node: minikube/192.168.59.100
Start Time: Mon, 10 Jan 2022 21:49:37 -0800
Labels: app=photegrity
name=empty-pod
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
empty:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/data/db from mongodb-persistence (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6xwsc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mongodb-persistence:
Type: HostPath (bare host directory volume)
Path: /c/MongoData/
HostPathType: Directory
kube-api-access-6xwsc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44m default-scheduler Successfully assigned default/empty-pod to minikube
Warning FailedMount 41m kubelet Unable to attach or mount volumes: unmounted volumes=[mongodb-persistence], unattached volumes=[kube-api-access-6xwsc mongodb-persistence]: timed out waiting for the condition
Warning FailedMount 13m (x23 over 44m) kubelet MountVolume.SetUp failed for volume "mongodb-persistence" : hostPath type check failed: /c/MongoData/ is not a directory
Warning FailedMount 3m29s (x15 over 39m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongodb-persistence], unattached volumes=[mongodb-persistence kube-api-access-6xwsc]: timed out waiting for the condition
This is a windows path C:\MongoData and in Docker I have used the unixized path /c/MongoData but any idea what Kubernetes would like to call this path?
This is one place where I have spent a lot of time and finally figured out the way to mount windows based paths on pods in Kubernetes.
If you are using a Windows system, you need to prefix '/run/desktop/mnt/host/' to the value in path attribute. So the YAML would look something like this:
apiVersion: v1
kind: Pod
metadata:
name: empty-pod
labels:
name: empty-pod
spec:
containers:
- name: empty
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: db-persistence
mountPath: /data/db
volumes:
- name: db-persistence
hostPath:
path: /run/desktop/mnt/host/c/MongoData/
type: Directory
Honestly, I feel this should have been a part of the documentation but somehow it is not.

How to deploy Mongodb replicaset on microk8s cluster

I'm trying to deploy a Mongodb ReplicaSet on microk8s cluster. I have installed a VM running on Ubuntu 20.04. After the deployment, the mongo pods do not run but crash. I've enabled microk8s storage, dns and rbac add-ons but still the same problem persists. Can any one help me find the reason behind it? Below is my manifest file:
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
selector:
matchLabels:
role: mongo
environment: test
serviceName: mongodb-service
replicas: 3
template:
metadata:
labels:
role: mongo
environment: test
replicaset: MainRepSet
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: replicaset
operator: In
values:
- MainRepSet
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 10
volumes:
- name: secrets-volume
secret:
secretName: shared-bootstrap-data
defaultMode: 256
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
resources:
requests:
cpu: 0.2
memory: 200Mi
ports:
- containerPort: 27017
volumeMounts:
- name: secrets-volume
readOnly: true
mountPath: /etc/secrets-volume
- name: mongodb-persistent-storage-claim
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: mongodb-persistent-storage-claim
spec:
storageClassName: microk8s-hostpath
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
Also, here are the pv, pvc and sc outputs:
yyy#xxx:$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongodb-persistent-storage-claim-mongo-0 Bound pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO microk8s-hostpath 13m
yyy#xxx:$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5b75ddf6-abbd-4ff3-a135-0312df1e6703 20Gi RWX Delete Bound container-registry/registry-claim microk8s-hostpath 38m
pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO Delete Bound default/mongodb-persistent-storage-claim-mongo-0 microk8s-hostpath 13m
yyy#xxx:$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
microk8s-hostpath (default) microk8s.io/hostpath Delete Immediate false 108m
yyy#xxx:$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
metrics-server-8bbfb4bdb-xvwcw 1/1 Running 1 148m
dashboard-metrics-scraper-78d7698477-4qdhj 1/1 Running 0 146m
kubernetes-dashboard-85fd7f45cb-6t7xr 1/1 Running 0 146m
hostpath-provisioner-5c65fbdb4f-ff7cl 1/1 Running 0 113m
coredns-7f9c69c78c-dr5kt 1/1 Running 0 65m
calico-kube-controllers-f7868dd95-wtf8j 1/1 Running 0 150m
calico-node-knzc2 1/1 Running 0 150m
I have installed the cluster using this command:
sudo snap install microk8s --classic --channel=1.21
Output of mongodb deployment:
yyy#xxx:$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/mongo-0 0/1 CrashLoopBackOff 5 4m18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 109m
service/mongodb-service ClusterIP None <none> 27017/TCP 4m19s
NAME READY AGE
statefulset.apps/mongo 0/3 4m19s
Pod logs:
yyy#xxx:$ kubectl logs pod/mongo-0
{"t":{"$date":"2021-09-07T16:21:13.191Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"}}}
yyy#xxx:$ kubectl describe pod/mongo-0
Name: mongo-0
Namespace: default
Priority: 0
Node: citest1/192.168.9.105
Start Time: Tue, 07 Sep 2021 16:17:38 +0000
Labels: controller-revision-hash=mongo-66bd776569
environment=test
replicaset=MainRepSet
role=mongo
statefulset.kubernetes.io/pod-name=mongo-0
Annotations: cni.projectcalico.org/podIP: 10.1.150.136/32
cni.projectcalico.org/podIPs: 10.1.150.136/32
Status: Running
IP: 10.1.150.136
IPs:
IP: 10.1.150.136
Controlled By: StatefulSet/mongo
Containers:
mongod-container:
Container ID: containerd://458e21fac3e87dcf304a9701da0eb827b2646efe94cabce7f283cd49f740c15d
Image: mongo
Image ID: docker.io/library/mongo#sha256:58ea1bc09f269a9b85b7e1fae83b7505952aaa521afaaca4131f558955743842
Port: 27017/TCP
Host Port: 0/TCP
Command:
numactl
--interleave=all
mongod
--wiredTigerCacheSizeGB
0.1
--bind_ip
0.0.0.0
--replSet
MainRepSet
--auth
--clusterAuthMode
keyFile
--keyFile
/etc/secrets-volume/internal-auth-mongodb-keyfile
--setParameter
authenticationMechanisms=SCRAM-SHA-1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 07 Sep 2021 16:24:03 +0000
Finished: Tue, 07 Sep 2021 16:24:03 +0000
Ready: False
Restart Count: 6
Requests:
cpu: 200m
memory: 200Mi
Environment: <none>
Mounts:
/data/db from mongodb-persistent-storage-claim (rw)
/etc/secrets-volume from secrets-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b7nf8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mongodb-persistent-storage-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb-persistent-storage-claim-mongo-0
ReadOnly: false
secrets-volume:
Type: Secret (a volume populated by a Secret)
SecretName: shared-bootstrap-data
Optional: false
kube-api-access-b7nf8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m53s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 7m52s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 7m50s default-scheduler Successfully assigned default/mongo-0 to citest1
Normal Pulled 7m25s kubelet Successfully pulled image "mongo" in 25.215669443s
Normal Pulled 7m21s kubelet Successfully pulled image "mongo" in 1.192994197s
Normal Pulled 7m6s kubelet Successfully pulled image "mongo" in 1.203239709s
Normal Pulled 6m38s kubelet Successfully pulled image "mongo" in 1.213451175s
Normal Created 6m38s (x4 over 7m23s) kubelet Created container mongod-container
Normal Started 6m37s (x4 over 7m23s) kubelet Started container mongod-container
Normal Pulling 5m47s (x5 over 7m50s) kubelet Pulling image "mongo"
Warning BackOff 2m49s (x23 over 7m20s) kubelet Back-off restarting failed container
The logs you provided show that you have an incorrectly set parameter wiredTigerCacheSizeGB. In your case it is 0.1, and according to the message
"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"
it should be at least 0.25.
In the section containers:
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
you should change in this place
- "--wiredTigerCacheSizeGB"
- "0.1"
the value "0.1" to any other greather or equal "0.25".
Additionally I have seen another error:
1 pod has unbound immediate PersistentVolumeClaims
It should related to what I wrote earlier. However, you may find alternative ways to solve it here, here and here.

Some Kubernetes pods consistently not able to resolve internal DNS on only one node

I have just moved my first cluster from minikube up to AWS EKS. All went pretty smoothly so far, except I'm running into some DNS issues I think, but only on one of the cluster nodes.
I have two nodes in the cluster running v1.14, and 4 pods of one type, and 4 of another, 3 of each work, but 1 of each - both on the same node - start then error (CrashLoopBackOff) with the script inside the container erroring because it can't resolve the hostname for the database. Deleting the errored pod, or even all pods, results in one pod on the same node failing every time.
The database is in its own pod and has a service assigned, none of the other pods of the same type have problems resolving the name or connecting. The database pod is on the same node as the pods that can't resolve the hostname. I'm not sure how to migrate the pod to a different node, but that might be worth trying to see if the problem follows.
No errors in the coredns pods. I'm not sure where to start looking to discover the issue from here, and any help or suggestions would be appreciated.
Providing the configs below. As mentioned, they all work on Minikube, and also they work on one node.
kubectl get pods - note age, all pod1's were deleted at the same time and they recreated themselves, 3 worked fine, 4th does not.
NAME READY STATUS RESTARTS AGE
pod1-85f7968f7-2cjwt 1/1 Running 0 34h
pod1-85f7968f7-cbqn6 1/1 Running 0 34h
pod1-85f7968f7-k9xv2 0/1 CrashLoopBackOff 399 34h
pod1-85f7968f7-qwcrz 1/1 Running 0 34h
postgresql-865db94687-cpptb 1/1 Running 0 3d14h
rabbitmq-667cfc4cc-t92pl 1/1 Running 0 34h
pod2-94b9bc6b6-6bzf7 1/1 Running 0 34h
pod2-94b9bc6b6-6nvkr 1/1 Running 0 34h
pod2-94b9bc6b6-jcjtb 0/1 CrashLoopBackOff 140 11h
pod2-94b9bc6b6-t4gfq 1/1 Running 0 34h
postgresql service
apiVersion: v1
kind: Service
metadata:
name: postgresql
spec:
ports:
- port: 5432
selector:
app: postgresql
pod1 deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod1
spec:
replicas: 4
selector:
matchLabels:
app: pod1
template:
metadata:
labels:
app: pod1
spec:
containers:
- name: pod1
image: us.gcr.io/gcp-project-8888888/pod1:latest
env:
- name: rabbitmquser
valueFrom:
secretKeyRef:
name: rabbitmq-secrets
key: rmquser
volumeMounts:
- mountPath: /data/files
name: datafiles
volumes:
- name: datafiles
persistentVolumeClaim:
claimName: datafiles-pv-claim
imagePullSecrets:
- name: container-readonly
pod2 depoloyment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod2
spec:
replicas: 4
selector:
matchLabels:
app: pod2
template:
metadata:
labels:
app: pod2
spec:
containers:
- name: pod2
image: us.gcr.io/gcp-project-8888888/pod2:latest
env:
- name: rabbitmquser
valueFrom:
secretKeyRef:
name: rabbitmq-secrets
key: rmquser
volumeMounts:
- mountPath: /data/files
name: datafiles
volumes:
- name: datafiles
persistentVolumeClaim:
claimName: datafiles-pv-claim
imagePullSecrets:
- name: container-readonly
CoreDNS config map to forward DNS to external service if it doesn't resolve internally. This is the only place I can think that would be causing the issue - but as said it works for one node.
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . 8.8.8.8
cache 30
loop
reload
loadbalance
}
Errored Pod output. Same for both pods, as it occurs in library code common to both. As mentioned, this does not occur for all pods so the issue likely doesn't lie with the code.
Error connecting to database (psycopg2.OperationalError) could not translate host name "postgresql" to address: Try again
Errored Pod1 description:
Name: xyz-94b9bc6b6-jcjtb
Namespace: default
Priority: 0
Node: ip-192-168-87-230.us-east-2.compute.internal/192.168.87.230
Start Time: Tue, 15 Oct 2019 19:43:11 +1030
Labels: app=pod1
pod-template-hash=94b9bc6b6
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.70.63
Controlled By: ReplicaSet/xyz-94b9bc6b6
Containers:
pod1:
Container ID: docker://f7dc735111bd94b7c7b698e69ad302ca19ece6c72b654057627626620b67d6de
Image: us.gcr.io/xyz/xyz:latest
Image ID: docker-pullable://us.gcr.io/xyz/xyz#sha256:20110cf126b35773ef3a8656512c023b1e8fe5c81dd88f19a64c5bfbde89f07e
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 16 Oct 2019 07:21:40 +1030
Finished: Wed, 16 Oct 2019 07:21:46 +1030
Ready: False
Restart Count: 139
Environment:
xyz: <set to the key 'xyz' in secret 'xyz-secrets'> Optional: false
Mounts:
/data/xyz from xyz (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-m72kz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
xyz:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: xyz-pv-claim
ReadOnly: false
default-token-m72kz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-m72kz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 2m22s (x3143 over 11h) kubelet, ip-192-168-87-230.us-east-2.compute.internal Back-off restarting failed container
Errored Pod 2 description:
Name: xyz-85f7968f7-k9xv2
Namespace: default
Priority: 0
Node: ip-192-168-87-230.us-east-2.compute.internal/192.168.87.230
Start Time: Mon, 14 Oct 2019 21:19:42 +1030
Labels: app=pod2
pod-template-hash=85f7968f7
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.84.69
Controlled By: ReplicaSet/pod2-85f7968f7
Containers:
pod2:
Container ID: docker://f7c7379f92f57ea7d381ae189b964527e02218dc64337177d6d7cd6b70990143
Image: us.gcr.io/xyz-217300/xyz:latest
Image ID: docker-pullable://us.gcr.io/xyz-217300/xyz#sha256:b9cecdbc90c5c5f7ff6170ee1eccac83163ac670d9df5febd573c2d84a4d628d
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 16 Oct 2019 07:23:35 +1030
Finished: Wed, 16 Oct 2019 07:23:41 +1030
Ready: False
Restart Count: 398
Environment:
xyz: <set to the key 'xyz' in secret 'xyz-secrets'> Optional: false
Mounts:
/data/xyz from xyz (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-m72kz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
xyz:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: xyz-pv-claim
ReadOnly: false
default-token-m72kz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-m72kz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m28s (x9208 over 34h) kubelet, ip-192-168-87-230.us-east-2.compute.internal Back-off restarting failed container
At the suggestion of a k8s community member, I applied the following change to my coredns configuration to be more in line with the best practice:
Line: proxy . 8.8.8.8 changed to forward . /etc/resolv.conf 8.8.8.8
I then deleted the pods, and after they were recreated by k8s, the issue did not appear again.
EDIT:
Turns out, that was not the issue at all as shortly afterwards the issue re-occurred and persisted. In the end, it was this: https://github.com/aws/amazon-vpc-cni-k8s/issues/641
Rolled back to 1.5.3 as recommended by Amazon, restarted the cluster, and the issue was resolved.

Why is my persistent disk failing to mount to my pod?

I've been trying to find out why my pod won't start up, when I go to describe it I get this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned my-namespace/nfs to gke-default-pool
Normal SuccessfulAttachVolume 2m attachdetach-controller AttachVolume.Attach succeeded for volume "nfspvc"
Warning FailedMount 50s kubelet, default-pool Unable to mount volumes for pod "nfs-8496cc5fd5-wjkm2_sxdb-branch161666(28ff323d-8839-11e9-a080-42010a8400bb)": timeout expired waiting for volumes to attach or mount for pod "my-namespace"/"nfs-8496cc5fd5-wjkm2". list of unmounted volumes=[nfspvc]. list of unattached volumes=[nfspvc default-token-ntmfv]
Warning FailedMount 28s (x9 over 2m) kubelet, default-pool MountVolume.MountDevice failed for volume "nfspvc" : executable file not found in $PATH
When I describe the compute disk, all looks good:
creationTimestamp: '2019-06-06T01:56:31.079-07:00'
id: '5701286856735681489'
kind: compute#disk
labelFingerprint: 42WmSpB8rSM=
lastAttachTimestamp: '2019-06-06T01:57:51.852-07:00'
name: nfs-pd
physicalBlockSizeBytes: '4096'
selfLink: <omitted>
sizeGb: '10'
status: READY
type: <omitted>
users:
- <omitted>
zone: <omitted>
Updates are available for some Cloud SDK components. To install them,
please run:
$ gcloud components update
And here is my pods manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
role: nfs
name: nfs
namespace: my-namespace
spec:
replicas: 1
selector:
matchLabels:
role: nfs
spec:
containers:
- image: gcr.io/google_containers/volume-nfs:0.8
name: nfs
ports:
- containerPort: 2049
name: nfs
protocol: TCP
- containerPort: 20048
name: mountd
protocol: TCP
- containerPort: 111
name: rpcbind
protocol: TCP
volumeMounts:
- mountPath: /exports
name: nfspvc
restartPolicy: Always
volumes:
- gcePersistentDisk:
fsType: ext4i
pdName: nfs-pd
name: nfspvc
I'm not really sure what 'MountVolume.MountDevice failed for volume "nfspvc" : executable file not found in $PATH' means, or what else I should look in to, to investigate the source of the issue?
If it makes any difference, this is created by a script, the ordering of which is:
Create the compute disk
Run helm, which in turn creates the above deployment, service, persistent volume, and persistent volume claim.
fsType: ext4i
Try removing that 'i'.