I am trying to deploy a stateful set of IPFS replicas on three Kubernetes worker nodes (based on this repo). The first three replicas work properly, but when it comes to the fourth one, it appears that the persistentVolumeClaims point to the shared physical memory. Therefore, the fourth node cannot acquire the lock. What would be the standard way to deploy many IPFS replicas in Kubernetes?
The fourth node printed the following log:
08:44:19.785 DEBUG cmd/ipfs: config path is /data/ipfs main.go:257
08:44:19.785 INFO cmd/ipfs: IPFS_PATH /data/ipfs main.go:301
08:44:19.785 DEBUG cmd/ipfs: Command cannot run on daemon. Checking if daemon is locked main.go:434
08:44:19.785 DEBUG lock: Checking lock lock.go:32
08:44:19.785 DEBUG lock: Can't lock file: /data/ipfs/repo.lock.
reason: cannot acquire lock: Lock FcntlFlock of /data/ipfs/repo.lock failed: resource temporarily unavailable lock.go:44
08:44:19.785 DEBUG fsrepo: (true)<->Lock is held at /data/ipfs fsrepo.go:302
Error: ipfs daemon is running. please stop it to run this command
Use 'ipfs daemon --help' for information about this command
Here is the yaml file for the stateful set:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ipfs
namespace: ipfs
spec:
selector:
matchLabels:
app: ipfs
serviceName: ipfs
replicas: 6
template:
metadata:
labels:
app: ipfs
spec:
initContainers:
- name: init-repo
image: ipfs/go-ipfs:v0.4.11#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command: ['/bin/sh', '/etc/ipfs-config/init.sh']
volumeMounts:
- name: www
mountPath: /data/ipfs
- name: secrets
mountPath: /etc/ipfs-secrets
- name: config
mountPath: /etc/ipfs-config
- name: init-peers
image: ipfs/go-ipfs:v0.4.11#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
command: ['/bin/sh', '/etc/ipfs-config/peers-kubernetes-refresh.sh']
volumeMounts:
- name: www
mountPath: /data/ipfs
- name: config
mountPath: /etc/ipfs-config
containers:
- name: ipfs
image: ipfs/go-ipfs:v0.4.11#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
env:
- name: IPFS_LOGGING
value: debug
command:
- ipfs
- daemon
ports:
- containerPort: 4001
name: swarm
- containerPort: 5001
name: api
- containerPort: 8080
name: readonly
volumeMounts:
- name: www
mountPath: /data/ipfs
volumes:
- name: secrets
secret:
secretName: ipfs
- name: config
configMap:
name: ipfs-config
- name: www
persistentVolumeClaim:
claimName: ipfs-pvc
Here is the persistent volume definition
apiVersion: v1
kind: PersistentVolume
metadata:
name: ipfs-pv
namespace: ipfs
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 200Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
And the persistent volume claim definition:
vapiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ipfs-pvc
namespace: ipfs
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Mi
kubectl describe of the failing node is as follows:
Name: ipfs-3
Namespace: ipfs
Priority: 0
Node: swift-153/10.70.20.153
Start Time: Tue, 27 Oct 2020 14:38:11 -0400
Labels: app=ipfs
controller-revision-hash=ipfs-74bb88dbb6
statefulset.kubernetes.io/pod-name=ipfs-3
Annotations: <none>
Status: Running
IP: 10.244.3.43
IPs:
IP: 10.244.3.43
Controlled By: StatefulSet/ipfs
Containers:
ipfs:
Container ID: docker://81349e969be9ffcafeb4d65adf9d0b2de7311e46068e36dd4f227f169f6dfcab
Image: ipfs/go-ipfs:v0.4.11#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
Image ID: docker-pullable://ipfs/go-ipfs#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
Ports: 4001/TCP, 5001/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
ipfs
daemon
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 27 Oct 2020 14:39:51 -0400
Finished: Tue, 27 Oct 2020 14:39:51 -0400
Ready: False
Restart Count: 4
Environment:
IPFS_LOGGING: debug
Mounts:
/data/ipfs from www (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hb785 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
secrets:
Type: Secret (a volume populated by a Secret)
SecretName: ipfs
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ipfs-config
Optional: false
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: ipfs-pvc
ReadOnly: false
default-token-hb785:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hb785
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m24s default-scheduler Successfully assigned ipfs/ipfs-3 to swift-153
Normal Pulled 2m2s (x3 over 2m21s) kubelet Container image "ipfs/go-ipfs:v0.4.11#sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054" already present on machine
Normal Created 2m (x3 over 2m19s) kubelet Created container ipfs
Normal Started 2m (x3 over 2m19s) kubelet Started container ipfs
Warning DNSConfigForming 103s (x10 over 2m24s) kubelet Search Line limits were exceeded, some search paths have been omitted, the applied search line is: ipfs.svc.cluster.local svc.cluster.local cluster.local search syslab.sandbox cs.toronto.edu
Warning BackOff 103s (x6 over 2m15s) kubelet Back-off restarting failed container
Related
Hey everyone Maybe you can help me :)
I created a Kubernetes test environment on GKE and try to deploy a monitoring solution on Prometheus platform.
I created the a StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gce-pd-retained
reclaimPolicy: Retain
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
replication-type: none
Then, created the PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-demo-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: gce-pd-retained
resources:
requests:
storage: 10Gi
I saw that the disk is created on the UI
Then I created the Pod:
apiVersion: v1
kind: Pod
metadata:
name: promepod
spec:
containers:
- name: prome
image: prom/prometheus
args:
- "--storage.tsdb.retention.time=12h"
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
volumeMounts:
- name: testvlm
mountPath: /etc/prometheus/
- name: testvlm
mountPath: /prometheus/
volumes:
- name: testvlm
persistentVolumeClaim:
claimName: pvc-demo-disk
But it keep failing with this error:
ts=2022-11-05T11:58:00.825Z caller=main.go:455 level=error msg="Error loading config (--config.file=/etc/prometheus/prometheus.yml)" file=/etc/prometheus/prometheus.yml err="open /etc/prometheus/prometheus.yml: no such file or directory"
anyone can guide me what am I doing wrong?
This is from the describe:
Port: <none>
Host Port: <none>
Args:
--storage.tsdb.retention.time=12h
--config.file=/etc/prometheus/prometheus.yml
--storage.tsdb.path=/prometheus/
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sat, 05 Nov 2022 13:58:00 +0200
Finished: Sat, 05 Nov 2022 13:58:00 +0200
Ready: False
Restart Count: 11
Environment: <none>
Mounts:
/etc/prometheus/ from testvlm (rw)
/prometheus/ from testvlm (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qtbwh (ro)
I will appreciate any help!
Thanks!
I writing the whole details above
I am trying to get a volume mounted as a non-root user in one of my containers. I'm trying an approach from this SO post using an initContainer to set the correct user, but when I try to start the configuration I get an "unbound immediate PersistentVolumneClaims" error. I suspect it's because the volume is mounted in both my initContainer and container, but I'm not sure why that would be the issue: I can see the initContainer taking the claim, but I would have thought when it exited that it would release it, letting the normal container take the claim. Any ideas or alternatives to getting the directory mounted as a non-root user? I did try using securityContext/fsGroup, but that seemed to have no effect. The /var/rdf4j directory below is the one that is being mounted as root.
Configuration:
apiVersion: v1
kind: PersistentVolume
metadata:
name: triplestore-data-storage-dir
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: local-storage
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Delete
hostPath:
path: /run/desktop/mnt/host/d/workdir/k8s-data/triplestore
type: DirectoryOrCreate
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: triplestore-data-storage
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: local-storage
volumeName: "triplestore-data-storage-dir"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: triplestore
labels:
app: demo
role: triplestore
spec:
selector:
matchLabels:
app: demo
role: triplestore
replicas: 1
template:
metadata:
labels:
app: demo
role: triplestore
spec:
containers:
- name: triplestore
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: Always
ports:
- name: http
protocol: TCP
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: storage
mountPath: /var/rdf4j
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
volumes:
- name: storage
persistentVolumeClaim:
claimName: "triplestore-data-storage"
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
triplestore-data-storage Bound triplestore-data-storage-dir 10Gi RWX local-storage 13s
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
triplestore-data-storage-dir 10Gi RWX Delete Bound default/triplestore-data-storage local-storage 17s
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Warning FailedScheduling pod/triplestore-6d6876f49-2s84c 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
19s Normal Scheduled pod/triplestore-6d6876f49-2s84c Successfully assigned default/triplestore-6d6876f49-2s84c to docker-desktop
3s Normal Pulled pod/triplestore-6d6876f49-2s84c Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
3s Normal Created pod/triplestore-6d6876f49-2s84c Created container take-data-dir-ownership
3s Normal Started pod/triplestore-6d6876f49-2s84c Started container take-data-dir-ownership
2s Warning BackOff pod/triplestore-6d6876f49-2s84c Back-off restarting failed container
46m Normal Pulled pod/triplestore-6d6876f49-9n5kt Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
79s Warning BackOff pod/triplestore-6d6876f49-9n5kt Back-off restarting failed container
21s Normal SuccessfulCreate replicaset/triplestore-6d6876f49 Created pod: triplestore-6d6876f49-2s84c
21s Normal ScalingReplicaSet deployment/triplestore Scaled up replica set triplestore-6d6876f49 to 1
kubectl describe pods/triplestore-6d6876f49-tw8r8
Name: triplestore-6d6876f49-tw8r8
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Mon, 17 Jan 2022 10:17:20 -0500
Labels: app=demo
pod-template-hash=6d6876f49
role=triplestore
Annotations: <none>
Status: Pending
IP: 10.1.2.133
IPs:
IP: 10.1.2.133
Controlled By: ReplicaSet/triplestore-6d6876f49
Init Containers:
take-data-dir-ownership:
Container ID: docker://89e7b1e3ae76c30180ee5083624e1bf5f30b55fd95bf1c24422fabe41ae74408
Image: eclipse/rdf4j-workbench:amd64-3.5.0
Image ID: docker-pullable://registry.com/publicrepos/docker_cache/eclipse/rdf4j-workbench#sha256:14621ad610b0d0269dedd9939ea535348cc6c147f9bd47ba2039488b456118ed
Port: <none>
Host Port: <none>
Command:
chown
-R
100:65533
/var/rdf4j
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 17 Jan 2022 10:22:59 -0500
Finished: Mon, 17 Jan 2022 10:22:59 -0500
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/rdf4j from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8wdv (ro)
Containers:
triplestore:
Container ID:
Image: eclipse/rdf4j-workbench:amd64-3.5.0
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Environment: <none>
Mounts:
/var/rdf4j from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8wdv (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: triplestore-data-storage
ReadOnly: false
kube-api-access-s8wdv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m24s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 6m13s default-scheduler Successfully assigned default/triplestore-6d6876f49-tw8r8 to docker-desktop
Normal Pulled 4m42s (x5 over 6m12s) kubelet Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
Normal Created 4m42s (x5 over 6m12s) kubelet Created container take-data-dir-ownership
Normal Started 4m42s (x5 over 6m12s) kubelet Started container take-data-dir-ownership
Warning BackOff 70s (x26 over 6m10s) kubelet Back-off restarting failed container
Solution
As it turns out the problem was that the initContainer wasn't running as root, it was running as the default user of the container, and so didn't have the permissions to run the chown command. In the linked SO comment, this was the first comment to the answer, with the response being that the initContainer ran as root - this has apparently changed in newer versions of kubernetes. There is a solution though, you can set the securityContext on the container to run as root, giving it permission to run the chown command, and that successfully allows the volume to be mounted as a non-root user. Here's the final configuration of the initContainer.
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
securityContext:
runAsUser: 0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
1 pod has unbound immediate PersistentVolumeClaims. - this error means the pod cannot bound to the PVC on the node where it has been scheduled to run on. This can happen when the PVC bounded to a PV that refers to a location that is not valid on the node that the pod is scheduled to run on. It will be helpful if you can post the complete output of kubectl get nodes -o wide, kubectl describe pvc triplestore-data-storage, kubectl describe pv triplestore-data-storage-dir to the question.
The mean time, PVC/PV is optional when using hostPath, can you try the following spec and see if the pod can come online:
apiVersion: apps/v1
kind: Deployment
metadata:
name: triplestore
labels:
app: demo
role: triplestore
spec:
selector:
matchLabels:
app: demo
role: triplestore
replicas: 1
template:
metadata:
labels:
app: demo
role: triplestore
spec:
containers:
- name: triplestore
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: IfNotPresent
ports:
- name: http
protocol: TCP
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: storage
mountPath: /var/rdf4j
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
volumes:
- name: storage
hostPath:
path: /run/desktop/mnt/host/d/workdir/k8s-data/triplestore
type: DirectoryOrCreate
While learning Kubernetes going by the book Kubernetes for developer, I am stuck at this point now.
I am trying to start Rabbitmq pod but but after lot of troubleshooting I have managed to get to this point but do not get clue where do I fix to get rid of the permission denied error.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
rabbitmq-56c67d8d7d-s8vp5 0/1 CrashLoopBackOff 5 5m40s
if I look at the logs of this contianer thats where I found:
# kubectl logs rabbitmq-56c67d8d7d-s8vp5
21:22:58.49
21:22:58.50 Welcome to the Bitnami rabbitmq container
21:22:58.51 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
21:22:58.51 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
21:22:58.52 Send us your feedback at containers#bitnami.com
21:22:58.52
21:22:58.52 INFO ==> ** Starting RabbitMQ setup **
21:22:58.54 INFO ==> Validating settings in RABBITMQ_* env vars..
21:22:58.56 INFO ==> Initializing RabbitMQ...
21:22:58.57 INFO ==> Generating random cookie
mkdir: cannot create directory ‘/bitnami/rabbitmq’: Permission denied
Here is my rabbitmq-deployment.yml
---
# EXPORT SERVICE INTERFACE
kind: Service
apiVersion: v1
metadata:
name: message-queue
labels:
app: rabbitmq
role: master
tier: queue
spec:
ports:
- port: 5672
targetPort: 5672
selector:
app: rabbitmq
role: master
tier: queue
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rabbitmq-pv-claim
labels:
app: rabbitmq
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabbitmq
spec:
replicas: 1
selector:
matchLabels:
app: rabbitmq
role: master
tier: queue
template:
metadata:
labels:
app: rabbitmq
role: master
tier: queue
spec:
nodeSelector:
boardType: x86vm
containers:
- name: rabbitmq
image: bitnami/rabbitmq:3.7
envFrom:
- configMapRef:
name: bitnami-rabbitmq-config
ports:
- name: queue
containerPort: 5672
- name: queue-mgmt
containerPort: 15672
livenessProbe:
exec:
command:
- rabbitmqctl
- status
initialDelaySeconds: 120
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command:
- rabbitmqctl
- status
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 5
volumeMounts:
- name: rabbitmq-storage
mountPath: /bitnami
volumes:
- name: rabbitmq-storage
persistentVolumeClaim:
claimName: rabbitmq-pv-claim
This is the rabbitmq-storage-class.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: rabbitmq-storage-class
labels:
app: rabbitmq
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
and persistant-volume.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-pv-claim
labels:
app: rabbitmq
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /bitnami
Logs:
# kubectl describe pods rabbitmq-5f7f787479-fpg6g
Name: rabbitmq-5f7f787479-fpg6g
Namespace: default
Priority: 0
Node: kube-worker-vm2/192.168.1.36
Start Time: Mon, 03 May 2021 12:29:17 +0100
Labels: app=rabbitmq
pod-template-hash=5f7f787479
role=master
tier=queue
Annotations: cni.projectcalico.org/podIP: 192.168.222.4/32
cni.projectcalico.org/podIPs: 192.168.222.4/32
Status: Running
IP: 192.168.222.4
IPs:
IP: 192.168.222.4
Controlled By: ReplicaSet/rabbitmq-5f7f787479
Containers:
rabbitmq:
Container ID: docker://bbdbb9c5d4b6737519d3dcf4bdda242a7fe904f2336334afe686e9b204fd6d5c
Image: bitnami/rabbitmq:3.7
Image ID: docker-pullable://bitnami/rabbitmq#sha256:8b6057997b74ebc81e934dd6c94e9da745635faa2d79b382cfda27b9176e0e6d
Ports: 5672/TCP, 15672/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 03 May 2021 12:30:48 +0100
Finished: Mon, 03 May 2021 12:30:48 +0100
Ready: False
Restart Count: 4
Liveness: exec [rabbitmqctl status] delay=120s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [rabbitmqctl status] delay=10s timeout=3s period=5s #success=1 #failure=3
Environment Variables from:
bitnami-rabbitmq-config ConfigMap Optional: false
Environment: <none>
Mounts:
/bitnami from rabbitmq-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4qmxr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-pv-claim
ReadOnly: false
default-token-4qmxr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4qmxr
Optional: false
QoS Class: BestEffort
Node-Selectors: boardType=x86vm
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m20s default-scheduler Successfully assigned default/rabbitmq-5f7f787479-fpg6g to kube-worker-vm2
Normal Created 96s (x4 over 2m18s) kubelet Created container rabbitmq
Normal Started 95s (x4 over 2m17s) kubelet Started container rabbitmq
Warning
BackOff 65s (x12 over 2m16s) kubelet Back-off restarting failed container
Normal Pulled 50s (x5 over 2m18s) kubelet Container image "bitnami/rabbitmq:3.7" already present on machine
When creating an image, the image creator often chooses to use a user other than root to run the process. This is the case for your image, and the user does not have write permissions on the /bitnami directory. You can verify this by commenting out the volume.
To fix the issue, you need to set a security contect for your pod: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod
Not sure about the exact syntax, but something like this should do the trick:
spec:
securityContext:
fsGroup: 1001 # the userid that is used in the image
nodeSelector:
boardType: x86vm
containers:
- name: rabbitmq
image: bitnami/rabbitmq:3.7
envFrom:
- configMapRef:
name: bitnami-rabbitmq-config
This makes the directory writeable by the user in the image.
Another thing: A deployment is for stateless services by design. If you have state to keep, always use a statefulset. It's very similiar to a deployment from a configuration point of view, but Kubernetes treats it very differently. See https://www.youtube.com/watch?v=Vrxr-7rjkvM for good explanation.
As per bitnami documentation, it depends on the kubernetes distribution
Quote from documentation
Adjust permissions of persistent volume mountpoint
As the image run as non-root by default, it is necessary to adjust the ownership of the persistent volume so that the container can write data into it.
By default, the chart is configured to use Kubernetes Security Context to automatically change the ownership of the volume. However, this feature does not work in all Kubernetes distributions. As an alternative, this chart supports using an initContainer to change the ownership of the volume before mounting it in the final destination.
You can enable this initContainer by setting volumePermissions.enabled to true.
I have provisioned NFS over DigitalOcean block storage to have readwritemany access mode, now i am able to share PV between deployments, but i am unable to share it within the deployment when i have multiple mount paths with same claim name. Can someone kindly comment why this is happening, is it the right way to use PV, and if NFS doesnt support this what else can i use that will enable me to share volumes between pods with multiple mount paths with.
Manifest
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-data
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 18Gi
storageClassName: nfs
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
spec:
replicas: 1
selector:
matchLabels:
app: web
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: web
spec:
containers:
- image: nginx:latest
name: nginx
resources: {}
volumeMounts:
- mountPath: /data
name: data
- mountPath: /beta
name: beta
volumes:
- name: data
persistentVolumeClaim:
claimName: nfs-data
- name: beta
persistentVolumeClaim:
claimName: nfs-data
PV DESCRIPTION
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-nfs-server-nfs-server-provisioner-0 Bound pvc-442af801-0b76-444d-afea-382a12380926 20Gi RWO do-block-storage 24h
nfs-data Bound pvc-0ae84fe2-025b-450d-8973-b74c80275cb7 18Gi RWX nfs 1h
Name: nfs-data
Namespace: default
StorageClass: nfs
Status: Bound
Volume: pvc-0ae84fe2-025b-450d-8973-b74c80275cb7
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: cluster.local/nfs-server-nfs-server-provisioner
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 18Gi
Access Modes: RWX
VolumeMode: Filesystem
Mounted By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 2m16s (x2 over 2m16s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "cluster.local/nfs-server-nfs-server-provisioner" or manually created by system administrator
Normal Provisioning 2m16s cluster.local/nfs-server-nfs-server-provisioner_nfs-server-nfs-server-provisioner-0_8dd7b303-b9a1-4a07-8c6b-906b81c07402 External provisioner is provisioning volume for claim "default/nfs-data"
Normal ProvisioningSucceeded 2m16s cluster.local/nfs-server-nfs-server-provisioner_nfs-server-nfs-server-provisioner-0_8dd7b303-b9a1-4a07-8c6b-906b81c07402 Successfully provisioned volume pvc-0ae84fe2-025b-450d-8973-b74c80275cb7
ERROR
Name: web-85f9fbf54-hfcvn
Namespace: default
Priority: 0
Node: pool-db4v93z2h-3yg9e/10.132.113.175
Start Time: Thu, 25 Jun 2020 19:25:40 +0500
Labels: app=web
pod-template-hash=85f9fbf54
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/web-85f9fbf54
Containers:
nginx:
Container ID:
Image: nginx:latest
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/beta from beta (rw)
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-pdsgk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-data
ReadOnly: false
beta:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-data
ReadOnly: false
default-token-pdsgk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-pdsgk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/web-85f9fbf54-hfcvn to pool-db4v93z2h-3yg9e
Warning FailedMount 22s kubelet, pool-db4v93z2h-3yg9e Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-pdsgk data beta]: timed out waiting for the condition
As I mentioned in comments you could try to use subPath, take a look at kubernetes and openshift documentation about it.
Sometimes, it is useful to share one volume for multiple uses in a single Pod. The volumeMounts.subPath property can be used to specify a sub-path inside the referenced volume instead of its root.
Here is an example of a Pod with a LAMP stack (Linux Apache Mysql PHP) using a single, shared volume. The HTML contents are mapped to its html folder, and the databases will be stored in its mysql folder:
apiVersion: v1
kind: Pod
metadata:
name: my-lamp-site
spec:
containers:
- name: mysql
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "rootpasswd"
volumeMounts:
- mountPath: /var/lib/mysql
name: site-data
subPath: mysql
- name: php
image: php:7.0-apache
volumeMounts:
- mountPath: /var/www/html
name: site-data
subPath: html
volumes:
- name: site-data
persistentVolumeClaim:
claimName: my-lamp-site-data
Databases are stored in the mysql folder.
HTML content is stored in the html folder.
If that won´t work for you I would say you have to use pvc for every mount path.
Like for example here.
apiVersion: v1
kind: Pod
metadata:
name: nfs-web
spec:
volumes:
# List of volumes to use, i.e. *what* to mount
- name: myvolume
< volume details, see below >
- name: mysecondvolume
< volume details, see below >
containers:
- name: mycontainer
volumeMounts:
# List of mount directories, i.e. *where* to mount
# We want to mount 'myvolume' into /usr/share/nginx/html
- name: myvolume
mountPath: /usr/share/nginx/html/
# We want to mount 'mysecondvolume' into /var/log
- name: mysecondvolume
mountPath: /var/log/
I tried to start fabric on kubernetes.
Then I get this issue CrashLoopBackOff. After search a bit, I can see from the logs that
2019-06-05 07:30:19.216 UTC [main] main -> ERRO 001 Cannot run peer because error when setting up MSP from directory /etc/hyperledger/fabric/msp: err Could not load a valid signer certificate from directory /etc/hyperledger/fabric/msp/signcerts, err stat /etc/hyperledger/fabric/msp/signcerts: no such file or directory
How can I see if I am mounting the correct folder?
I want to access my crashed container to check if my msp folder are there.
Any help is appreciated!
edit 1: kubectl pod describe for peer1 org 1
Name: peer1-org1-7b9cf7fbd4-74b7q
Namespace: org1
Priority: 0
PriorityClassName: <none>
Node: minikube/10.0.2.15
Start Time: Wed, 05 Jun 2019 17:48:21 +0900
Labels: app=hyperledger
org=org1
peer-id=peer1
pod-template-hash=7b9cf7fbd4
role=peer
Annotations: <none>
Status: Running
IP: 172.17.0.9
Controlled By: ReplicaSet/peer1-org1-7b9cf7fbd4
Containers:
couchdb:
Container ID: docker://7b5e80103491476843d365dc234316ae55a92d66f2ea009cf9162583a76907fb
Image: hyperledger/fabric-couchdb:x86_64-1.0.0
Image ID: docker-pullable://hyperledger/fabric-couchdb#sha256:e89b0f95f6ff674fd043795090dd65a11d727ec005d925545cf0b4fc48aa221d
Port: 5984/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 05 Jun 2019 17:49:49 +0900
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sjp8t (ro)
peer1-org1:
Container ID: docker://95e743dceafbd78f7e29476302ac86d7eb48f97c9a50db3d174dc6684511c97b
Image: hyperledger/fabric-peer:x86_64-1.0.0
Image ID: docker-pullable://hyperledger/fabric-peer#sha256:b7c1c2a6b356996c3dbe2b9554055cd2b63194cd7a492a83de2dbabf7f7e3c65
Ports: 7051/TCP, 7052/TCP, 7053/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
peer
Args:
node
start
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 05 Jun 2019 17:50:58 +0900
Finished: Wed, 05 Jun 2019 17:50:58 +0900
Ready: False
Restart Count: 3
Environment:
CORE_LEDGER_STATE_STATEDATABASE: CouchDB
CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS: localhost:5984
CORE_VM_ENDPOINT: unix:///host/var/run/docker.sock
CORE_LOGGING_LEVEL: DEBUG
CORE_PEER_TLS_ENABLED: false
CORE_PEER_GOSSIP_USELEADERELECTION: true
CORE_PEER_GOSSIP_ORGLEADER: false
CORE_PEER_PROFILE_ENABLED: true
CORE_PEER_TLS_CERT_FILE: /etc/hyperledger/fabric/tls/server.crt
CORE_PEER_TLS_KEY_FILE: /etc/hyperledger/fabric/tls/server.key
CORE_PEER_TLS_ROOTCERT_FILE: /etc/hyperledger/fabric/tls/ca.crt
CORE_PEER_ID: peer1.org1
CORE_PEER_ADDRESS: peer1.org1:7051
CORE_PEER_GOSSIP_EXTERNALENDPOINT: peer1.org1:7051
CORE_PEER_LOCALMSPID: Org1MSP
Mounts:
/etc/hyperledger/fabric/msp from certificate (rw,path="peers/peer1.org1/msp")
/etc/hyperledger/fabric/tls from certificate (rw,path="peers/peer1.org1/tls")
/host/var/run/ from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sjp8t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
certificate:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: org1-pv
ReadOnly: false
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
default-token-sjp8t:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-sjp8t
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m58s default-scheduler Successfully assigned org1/peer1-org1-7b9cf7fbd4-74b7q to minikube
Normal Pulling 2m55s kubelet, minikube Pulling image "hyperledger/fabric-couchdb:x86_64-1.0.0"
Normal Pulled 90s kubelet, minikube Successfully pulled image "hyperledger/fabric-couchdb:x86_64-1.0.0"
Normal Created 90s kubelet, minikube Created container couchdb
Normal Started 90s kubelet, minikube Started container couchdb
Normal Pulling 90s kubelet, minikube Pulling image "hyperledger/fabric-peer:x86_64-1.0.0"
Normal Pulled 71s kubelet, minikube Successfully pulled image "hyperledger/fabric-peer:x86_64-1.0.0"
Normal Created 21s (x4 over 70s) kubelet, minikube Created container peer1-org1
Normal Started 21s (x4 over 70s) kubelet, minikube Started container peer1-org1
Normal Pulled 21s (x3 over 69s) kubelet, minikube Container image "hyperledger/fabric-peer:x86_64-1.0.0" already present on machine
Warning BackOff 5s (x6 over 68s) kubelet, minikube Back-off restarting failed container
edit 2:
Kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
org1-artifacts-pv 500Mi RWX Retain Available 39m
org1-pv 500Mi RWX Retain Available 39m
org2-artifacts-pv 500Mi RWX Retain Available 39m
org2-pv 500Mi RWX Retain Available 39m
orgorderer1-pv 500Mi RWX Retain Available 39m
pvc-aa87a86f-876e-11e9-99ef-080027f6ce3c 10Mi RWX Delete Bound orgorderer1/orgorderer1-pv standard 39m
pvc-aadb69ff-876e-11e9-99ef-080027f6ce3c 10Mi RWX Delete Bound org2/org2-pv standard 39m
pvc-ab2e4d8e-876e-11e9-99ef-080027f6ce3c 10Mi RWX Delete Bound org2/org2-artifacts-pv standard 39m
pvc-abb04335-876e-11e9-99ef-080027f6ce3c 10Mi RWX Delete Bound org1/org1-pv standard 39m
pvc-abfaaf76-876e-11e9-99ef-080027f6ce3c 10Mi RWX Delete Bound org1/org1-artifacts-pv standard 39m
Kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
org1-artifacts-pv Bound pvc-abfaaf76-876e-11e9-99ef-080027f6ce3c 10Mi RWX standard 40m
org1-pv Bound pvc-abb04335-876e-11e9-99ef-080027f6ce3c 10Mi RWX standard 40m
edit 3: org1-cli.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: org1-artifacts-pv
spec:
capacity:
storage: 500Mi
accessModes:
- ReadWriteMany
hostPath:
path: "/opt/share/channel-artifacts"
# nfs:
# path: /opt/share/channel-artifacts
# server: localhost #change to your nfs server ip here
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: org1
name: org1-artifacts-pv
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Mi
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: org1
name: cli
spec:
replicas: 1
strategy: {}
template:
metadata:
labels:
app: cli
spec:
containers:
- name: cli
image: hyperledger/fabric-tools:x86_64-1.0.0
env:
- name: CORE_PEER_TLS_ENABLED
value: "false"
#- name: CORE_PEER_TLS_CERT_FILE
# value: /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1/peers/peer0.org1/tls/server.crt
#- name: CORE_PEER_TLS_KEY_FILE
# value: /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1/peers/peer0.org1/tls/server.key
#- name: CORE_PEER_TLS_ROOTCERT_FILE
# value: /opt/gopath/src/github.com/hyperledger/fabric/peer/crypto/peerOrganizations/org1/peers/peer0.org1/tls/ca.crt
- name: CORE_VM_ENDPOINT
value: unix:///host/var/run/docker.sock
- name: GOPATH
value: /opt/gopath
- name: CORE_LOGGING_LEVEL
value: DEBUG
- name: CORE_PEER_ID
value: cli
- name: CORE_PEER_ADDRESS
value: peer0.org1:7051
- name: CORE_PEER_LOCALMSPID
value: Org1MSP
- name: CORE_PEER_MSPCONFIGPATH
value: /etc/hyperledger/fabric/msp
workingDir: /opt/gopath/src/github.com/hyperledger/fabric/peer
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumeMounts:
# - mountPath: /opt/gopath/src/github.com/hyperledger/fabric/peer
# name: certificate
# subPath: scripts
- mountPath: /host/var/run/
name: run
# - mountPath: /opt/gopath/src/github.com/hyperledger/fabric/examples/chaincode/go
# name: certificate
# subPath: chaincode
- mountPath: /etc/hyperledger/fabric/msp
name: certificate
subPath: users/Admin#org1/msp
- mountPath: /opt/gopath/src/github.com/hyperledger/fabric/peer/channel-artifacts
name: artifacts
volumes:
- name: certificate
persistentVolumeClaim:
claimName: org1-pv
- name: artifacts
persistentVolumeClaim:
claimName: org1-artifacts-pv
- name: run
hostPath:
path: /var/run
org1-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: org1
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: org1-pv
spec:
capacity:
storage: 500Mi
accessModes:
- ReadWriteMany
hostPath:
path: /opt/share/crypto-config/peerOrganizations/org1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: org1
name: org1-pv
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Mi
---
edit 3: peer1-org1
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: org1
name: peer1-org1
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: hyperledger
role: peer
peer-id: peer1
org: org1
spec:
containers:
- name: couchdb
image: hyperledger/fabric-couchdb:x86_64-1.0.0
ports:
- containerPort: 5984
- name: peer1-org1
image: hyperledger/fabric-peer:x86_64-1.0.0
env:
- name: CORE_LEDGER_STATE_STATEDATABASE
value: "CouchDB"
- name: CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS
value: "localhost:5984"
- name: CORE_VM_ENDPOINT
value: "unix:///host/var/run/docker.sock"
- name: CORE_LOGGING_LEVEL
value: "DEBUG"
- name: CORE_PEER_TLS_ENABLED
value: "false"
- name: CORE_PEER_GOSSIP_USELEADERELECTION
value: "true"
- name: CORE_PEER_GOSSIP_ORGLEADER
value: "false"
- name: CORE_PEER_PROFILE_ENABLED
value: "true"
- name: CORE_PEER_TLS_CERT_FILE
value: "/etc/hyperledger/fabric/tls/server.crt"
- name: CORE_PEER_TLS_KEY_FILE
value: "/etc/hyperledger/fabric/tls/server.key"
- name: CORE_PEER_TLS_ROOTCERT_FILE
value: "/etc/hyperledger/fabric/tls/ca.crt"
- name: CORE_PEER_ID
value: peer1.org1
- name: CORE_PEER_ADDRESS
value: peer1.org1:7051
- name: CORE_PEER_GOSSIP_EXTERNALENDPOINT
value: peer1.org1:7051
- name: CORE_PEER_LOCALMSPID
value: Org1MSP
workingDir: /opt/gopath/src/github.com/hyperledger/fabric/peer
ports:
- containerPort: 7051
- containerPort: 7052
- containerPort: 7053
command: ["peer"]
args: ["node","start"]
volumeMounts:
#- mountPath: /opt/gopath/src/github.com/hyperledger/fabric/peer/channel-artifacts
# name: certificate
# subPath: channel-artifacts
- mountPath: /etc/hyperledger/fabric/msp
name: certificate
#subPath: crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/msp
subPath: peers/peer1.org1/msp
- mountPath: /etc/hyperledger/fabric/tls
name: certificate
#subPath: crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tls/
subPath: peers/peer1.org1/tls
- mountPath: /host/var/run/
name: run
volumes:
- name: certificate
persistentVolumeClaim:
claimName: org1-pv
- name: run
hostPath:
path: /run
---
apiVersion: v1
kind: Service
metadata:
namespace: org1
name: peer1
spec:
selector:
app: hyperledger
role: peer
peer-id: peer1
org: org1
type: NodePort
ports:
- name: externale-listen-endpoint
protocol: TCP
port: 7051
targetPort: 7051
nodePort: 30003
- name: chaincode-listen
protocol: TCP
port: 7052
targetPort: 7052
nodePort: 30004
---
You can do a kubectl edit pod <podname> -n <namespace> and change the command section to sleep 1000000000 then the pod will restart and you can get in there and see whats going. Or just delete the deployment, edit your yaml to remove the peer launch command, redeploy your yaml and see how the directories are laid out.
After a bit searching, I tried to mount the volume to nginx Kubernetes PVC sample. Changing the pods claimName to my created pvc. From there I exec bash to it and explore my file. Then I can see if I did mount the correct folder or not.