Unable to mount the volume to the pod in kubernetes - kubernetes

I have k8tes cluster in which I am facing issues while mounting the existing volume to the pod in the new deployment. I have the existing deployments where I am mounting the same existing PV and PVCs. But facing issues only new deployment.
What could be the reason? How can I mount(NFS) volume to the new deployments because both PV and PVC statuses are bound and claimed respectively?

you can not ideally If your mount mode is set to ReadWriteOnce.
If you are planning to use the NFS and want to attach multiple PODs to a single mount you have to use the ReadWriteMany.
Example :
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-data
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: nfs
A PersistentVolumeClaim (PVC) is a request for storage by a user. It
is similar to a Pod. Pods consume node resources and PVCs consume PV
resources. Pods can request specific levels of resources (CPU and
Memory). Claims can request specific size and access modes (e.g., they
can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see
AccessModes).
Acces mode : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
GKE example : https://medium.com/platformer-blog/nfs-persistent-volumes-with-kubernetes-a-case-study-ce1ed6e2c266

Related

Initializing a dynamically provisioned shared volume with ReadOnlyMany access mode

My GKE deployment consists of N pods (possibly on different nodes) and a shared volume, which is dynamically provisioned by pd.csi.storage.gke.io and is a Persistent Disk in GCP. I need to initialize this disk with data before the pods go live.
My problem is I need to set accessModes to ReadOnlyMany and be able to mount it to all pods across different nodes in read-only mode, which I assume effectively would make it impossible to mount it in write mode to the initContainer.
Is there a solution to this issue? Answer to this question suggests a good solution for a case when each pod has their own disk mounted, but I need to have one disk shared among all pods since my data is quite large.
With GKE 1.21 and later, you can enable the managed Filestore CSI driver in your clusters. You can enable the driver for new clusters
gcloud container clusters create CLUSTER_NAME \
--addons=GcpFilestoreCsiDriver ...
or update existing clusters:
gcloud container clusters update CLUSTER_NAME \
--update-addons=GcpFilestoreCsiDriver=ENABLED
Once you've done that, create a storage class (or have or platform admin do it):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: filestore-example
provisioner: filestore.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
tier: standard
network: default
After that, you can use PVCs and dynamic provisioning:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
spec:
accessModes:
- ReadWriteMany
storageClassName: filestore-example
resources:
requests:
storage: 1Ti
...I need to have one disk shared among all pods
You can try Filestore. First your create a FileStore instance and save your data on a FileStore volume. Then you install FileStore driver on your cluster. Finally you share the data with pods that needs to read the data using a PersistentVolume referring the FileStore instance and volume above.

What is the PersistentVolumeClaim policy for local PersistentVolume in Kubernetes?

Scenario 1:
I have 3 local-persistent-volumes provisioned, each pv is mounted on different node:
10.30.18.10
10.30.18.11
10.30.18.12
When I start my app with 3 replicas using:
kind: StatefulSet
metadata:
name: my-db
spec:
replicas: 3
...
...
volumeClaimTemplates:
- metadata:
name: my-local-vol
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-local-sc"
resources:
requests:
storage: 10Gi
Then I notice pods and pvs are on the same host:
pod1 with ip 10.30.18.10 has claimed the pv that is mounted on 10.30.18.10
pod2 with ip 10.30.18.11 has claimed the pv that is mounted on 10.30.18.11
pod3 with ip 10.30.18.12 has claimed the pv that is mounted on 10.30.18.12
(whats not happening is: pod1 with ip 10.30.18.10 has claimed the pv that is mounted on different node 10.30.18.12 etc)
The only common config between pv and pvc is storageClassName, so I didn't configure this behavior.
Question:
So, who is responsible for this magic? Kubernetes scheduler? Kubernetes provisioner?
Scenario 2:
I have 3 local-persistent-volumes provisioned:
pv1 has capacity.storage of 10Gi
pv2 has capacity.storage of 100Gi
pv3 has capacity.storage of 100Gi
Now, I start my app with 1 replica
kind: StatefulSet
metadata:
name: my-db
spec:
replicas: 1
...
...
volumeClaimTemplates:
- metadata:
name: my-local-vol
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-local-sc"
resources:
requests:
storage: 10Gi
I want to ensure that this StatefulSet always claim pv1 (10Gi) even this is on a different node, and don't claim pv2 (100Gi) and pv3 (100Gi)
Question:
Does this happen automatically?
How do I ensure the desired behavior? Should I use a separate storageClassName to ensure this?
What is the PersistentVolumeClaim policy? Where can I find more info?
EDIT:
yml used for StorageClass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: my-local-pv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
With local Persistent Volumes, this is the expected behaviour. Let me try to explain what happens when using local storage.
The usual setup for local storage on a cluster is the following:
A local storage class, configured to be WaitForFirstConsumer
A series of local persistent volumes, linked to the local storage class
And this is all well documented with examples in the official documentation: https://kubernetes.io/docs/concepts/storage/volumes/#local
With this done, Persistent Volume Claims can request storage from the local storage class and StatefulSets can have a volumeClaimTemplate which requests storage of the local storage class.
Let me take as example your StatefulSet with 3 replicas, each one requires local storage with the volumeClaimTemplate.
When the Pods are first created, they request a storage of the required storageClass. For example your my-local-sc
Since this storage class is manually created and does not support dynamically provisioning of new PVs (like, for example, Ceph or similar) it is checked if a PV attached to the storage class is available to be bound.
If a PV is selected, it is bound to the newly created PVC (and from now, can be used only with that particular PV, since it is now Bound)
Since the PV is of type local, the PV has a nodeAffinity required which selects a node.
This force the Pod, now bound to that PV, to be scheduled only on that particular node.
This is why each Pod was scheduled on the same node of the bounded persistent volume. And this means that the Pod is restricted to run on that node only.
You can test this easily by draining / cordoning one of the nodes and then trying to restart the Pod bound to the PV available on that particular node. What you should see is that the Pod will not start, as the PV is restricted from its nodeAffinity and the node is not available.
Once each Pod of the StatefulSet is bound to a PV, that Pod will be scheduled only on a specific node.. Pods will not change the PV that they are using, unless the PVC is removed (which will force the Pod to request again a new PV to bound)
Since local storage is handled manually, PV which were bounded and have the related PVC removed from the cluster, enter in Released state and cannot be claimed anymore, they must be handled by someone.. maybe deleting them and then recreating new ones at the same location (and maybe cleaning the filesystem as well, depending on the situation)
This means that local storage is OK to be used only:
If HA is not a problem.. for example, I don't care if my app is blocked by a single node not working
If HA is handled directly by the app itself. For example, a StatefulSet with 3 Pods like a multi-primary database (Galera, Clickhouse, Percona for examples) or ElasticSearch or Kafka, Zookeeper or something like that.. all will handle the HA on their own as they can resist one of their nodes being down as long as there's quorum.
UPDATE
Regarding the Scenario 2 of your question. Let's say you have multiple Available PVs and a single Pod which starts and wants to Bound to one of them. This is a normal behaviour and the control plane would select one of those PVs on its own (if they match with the requests in Claim)
There's a specific way to pre-bind a PV and a PVC, so that they will always bind together. This is described in the docs as "reserving a PV": https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reserving-a-persistentvolume
But the problem is that this cannot be applied to olume claim templates, as it requires the claim to be created manually with special properties.
The volume claim template tho, as a selector field which can be used to restrict the selection of a PV based on labels. It can be seen in the API specs ( https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#persistentvolumeclaimspec-v1-core )
When you create a PV, you label it with what you want.. for example you could label it like the following:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-small-pv
labels:
size-category: small
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- example-node-1
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-big-pv
labels:
size-category: big
spec:
capacity:
storage: 100Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- example-node-2
And then the claim template can select a category of volumes based on the label. Or maybe it doesn't care so it doesn't specify selector and can use all of them (provided that the size is enough for its claim request)
This could be useful.. but it's not the only way to select or restrict which PVs can be selected, because when the PV is first bound, if the storage class is WaitForFirstConsumer, the following is also applied:
Delaying volume binding ensures that the PersistentVolumeClaim binding
decision will also be evaluated with any other node constraints the
Pod may have, such as node resource requirements, node selectors, Pod
affinity, and Pod anti-affinity.
Which means that if the Pod has a node affinity to one of the nodes, it will select for sure a PV on that node (if the local storage class used is WaitForFirstConsumer)
Last, let me quote the offical documentation for things that I think they could answer your questions:
From https://kubernetes.io/docs/concepts/storage/persistent-volumes/
A user creates, or in the case of dynamic provisioning, has already
created, a PersistentVolumeClaim with a specific amount of storage
requested and with certain access modes. A control loop in the master
watches for new PVCs, finds a matching PV (if possible), and binds
them together. If a PV was dynamically provisioned for a new PVC, the
loop will always bind that PV to the PVC. Otherwise, the user will
always get at least what they asked for, but the volume may be in
excess of what was requested. Once bound, PersistentVolumeClaim binds
are exclusive, regardless of how they were bound. A PVC to PV binding
is a one-to-one mapping, using a ClaimRef which is a bi-directional
binding between the PersistentVolume and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not
exist. Claims will be bound as matching volumes become available. For
example, a cluster provisioned with many 50Gi PVs would not match a
PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to
the cluster.
From https://kubernetes.io/docs/concepts/storage/volumes/#local
Compared to hostPath volumes, local volumes are used in a durable and
portable manner without manually scheduling pods to nodes. The system
is aware of the volume's node constraints by looking at the node
affinity on the PersistentVolume.
However, local volumes are subject to the availability of the
underlying node and are not suitable for all applications. If a node
becomes unhealthy, then the local volume becomes inaccessible by the
pod. The pod using this volume is unable to run. Applications using
local volumes must be able to tolerate this reduced availability, as
well as potential data loss, depending on the durability
characteristics of the underlying disk.

How persistent volume and persistence volume claim bound each other in kubernetes

I am working on creating persistence volume & persistence volume claim in kubernetes. Both below configuration working fine and I am able to store the data in persistence volume storage path.
I created persistence volume
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-vol
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi #Size of the volume
accessModes:
- ReadWriteOnce #type of access
hostPath:
path: "/mnt/data" #host location
---
and Persistence volume claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
Here there is no connection between persistence volume & persistence volume claim in above configuration files. How both are bound to each other.
Persistence volume & persistence volume claim
Say in deployment.yml, we can point the name of persistence volume claim. So that POD -> PVC -> PV -> host machine storage location.
Could anyone help me to understand the how persistence volume & persistence volume claim bound to each other by above configuration files.
In a nutshell binding between PV and PVC is decided by matching capacity and accessModes. Since you have 1Gi and ReadWriteOnce in both PV and PVC the binding was successful.
From the docs here
A user creates, or in the case of dynamic provisioning, has already
created, a PersistentVolumeClaim with a specific amount of storage
requested and with certain access modes. A control loop in the master
watches for new PVCs, finds a matching PV (if possible), and binds
them together. If a PV was dynamically provisioned for a new PVC, the
loop will always bind that PV to the PVC. Otherwise, the user will
always get at least what they asked for, but the volume may be in
excess of what was requested. Once bound, PersistentVolumeClaim binds
are exclusive, regardless of how they were bound. A PVC to PV binding
is a one-to-one mapping, using a ClaimRef which is a bi-directional
binding between the PersistentVolume and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not
exist. Claims will be bound as matching volumes become available. For
example, a cluster provisioned with many 50Gi PVs would not match a
PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to
the cluster
Do note that the storage classes(manual) in both the pv and pvc are the same which is one of the reasons they are bound.if they are different, then the pvc will go to pending status. It's imperative that they are the same to be bound.
Hope this helps, You can also refer to this thread for various ways to bind.
Can a PVC be bound to a specific PV?
PVC documentation: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
PVCs don't necessarily have to request a class. A PVC with its storageClassName set equal to "" is always interpreted to be requesting a PV with no class, so it can only be bound to PVs with no class (no annotation or one set equal to ""). A PVC with no storageClassName is not quite the same and is treated differently by the cluster, depending on whether the DefaultStorageClass admission plugin is turned on.

Kubernetes Unable to mount volumes for pod

I'm trying to setup a volume to use with Mongo on k8s.
I use kubectl create -f pv.yaml to create the volume.
pv.yaml:
kind: PersistentVolume
apiVersion: v1
metadata:
name: pvvolume
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/nfs"
claimRef:
kind: PersistentVolumeClaim
namespace: default
name: pvvolume
I then deploy this StatefulSet that has pods making PVCs to this volume.
My volume seems to have been created without problem, I'm expecting it to just use the storage of the host node.
When I try to deploy I get the following error:
Unable to mount volumes for pod
"mongo-0_default(2735bc71-5201-11e8-804f-02dffec55fd2)": timeout
expired waiting for volumes to attach/mount for pod
"default"/"mongo-0". list of unattached/unmounted
volumes=[mongo-persistent-storage]
Have a missed a step in setting up my persistent volume?
A persistent volume is just the declaration of availability of some storage inside your kubernetes cluster. There is no binding with your pod at this stage.
Since your pod is deployed through a StatefulSet, there should be in your cluster one or more PersistentVolumeClaims which are the objects that connect a pod with a PersistentVolume.
In order to manually bind a PV with a PVC you need to edit your PVC by adding the following in its spec section:
volumeName: "<your persistent volume name>"
Here an explanation on how this process works: https://docs.openshift.org/latest/dev_guide/persistent_volumes.html#persistent-volumes-volumes-and-claim-prebinding
My case is an edge case, and I doubt that you will reach it. However, I will describe it because, it cost me a lot of grey hairs - and maybe it will save yours.
This same error occurred for me despite PV and PVC being binned. Pod was constantly in ContainerCreating stare, yet kubectl get events throw the error asked in this question.
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
sewage-db 5Ti RWO Retain Bound global-sewage/sewage-db nfs 3h40m
$kubectl get pvc -n global-sewage
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
sewage-db Bound sewage-db 5Ti RWO nfs 3h39m
After rebooting the server it turned out that, one of 32GiB RAM physical memory was corrupted. Removing the memory fixed the issue.

Kubernetes NFS Persistent Volumes - multiple claims on same volume? Claim stuck in pending?

Use case:
I have a NFS directory available and I want to use it to persist data for multiple deployments & pods.
I have created a PersistentVolume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
server: http://mynfs.com
path: /server/mount/point
I want multiple deployments to be able to use this PersistentVolume, so my understanding of what is needed is that I need to create multiple PersistentVolumeClaims which will all point at this PersistentVolume.
kind: PersistentVolumeClaim
apiVersion: v1
metaData:
name: nfs-pvc-1
namespace: default
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Mi
I believe this to create a 50MB claim on the PersistentVolume. When I run kubectl get pvc, I see:
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
nfs-pvc-1 Bound nfs-pv 10Gi RWX 35s
I don't understand why I see 10Gi capacity, not 50Mi.
When I then change the PersistentVolumeClaim deployment yaml to create a PVC named nfs-pvc-2 I get this:
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
nfs-pvc-1 Bound nfs-pv 10Gi RWX 35s
nfs-pvc-2 Pending 10s
PVC2 never binds to the PV. Is this expected behaviour? Can I have multiple PVCs pointing at the same PV?
When I delete nfs-pvc-1, I see the same thing:
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
nfs-pvc-2 Pending 10s
Again, is this normal?
What is the appropriate way to use/re-use a shared NFS resource between multiple deployments / pods?
Basically you can't do what you want, as the relationship PVC <--> PV is one-on-one.
If NFS is the only storage you have available and would like multiple PV/PVC on one nfs export, use Dynamic Provisioning and a default storage class.
It's not in official K8s yet, but this one is in the incubator and I've tried it and it works well: https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client
This will enormously simplify your volume provisioning as you only need to take care of the PVC, and the PV will be created as a directory on the nfs export / server that you have defined.
From: https://docs.openshift.org/latest/install_config/storage_examples/shared_storage.html
As Baroudi Safwen mentioned, you cannot bind two pvc to the same pv, but you can use the same pvc in two different pods.
volumes:
- name: nfsvol-2
persistentVolumeClaim:
claimName: nfs-pvc-1 <-- USE THIS ONE IN BOTH PODS
A persistent volume claim is exclusively bound to a persistent volume.
You cannot bind 2 pvc to the same pv. I guess you are interested in the dynamic provisioning. I faced this issue when I was deploying statefulsets, which require dynamic provisioning for pods. So you need to deploy an NFS provisioner in your cluster, the NFS provisioner(pod) will have access to the NFS folder(hostpath), and each time a pod requests a volume, the NFS provisioner will mount it in the NFS directory on behalf of the pod. Here is the github repository to deploy it:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs/deploy/kubernetes
You have to be careful though, you must ensure the nfs provisioner always runs on the same machine where you have the NFS folder by making use of the node selector since you the volume is of type hostpath.
For my future-self and everyone else looking for the official documentation:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#binding
Once bound, PersistentVolumeClaim binds are exclusive, regardless of
how they were bound. A PVC to PV binding is a one-to-one mapping,
using a ClaimRef which is a bi-directional binding between the
PersistentVolume and the PersistentVolumeClaim.
a few points on dynamic provisioning..
using dynamic provisioning of nfs prevents you for changing any of the default nfs mount options. On my platform this uses rsize/wsize of 1M. this can cause huge problems in some applications using small files or block reading. (I've just hit this issue in a big way)
dynamic is a great option if it suits your needs. I'm now stuck with creating 250 pv/pvc pairs for my application that was being handled by dynamic due to the 1-1 relationship.