I am trying to understand how Kubernetes handles the persistent volumes on the node's filesystem.
For example, if I have a minikube as my Kubernetes cluster node, and I create multiple PVs with PVC for may pods and if I ssh to minikube, where I can find the PV on minikube's filesystem?
If I type
lsblk
I get
sda 8:0 0 19.5G 0 disk
but no PV disks are listed.
Thank you for your answers.
You will not see it because it's inside API as an API Object.
I recommend reading Kubernetes documentation regarding Persistent Volumes.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs there is the StorageClass resource.
Please see the detailed walkthrough with working examples.
You can also have a look at the Kubernetes Volumes Guide which explains the types of storage, how long do they last and how to use them in examples.
Because they are hostPath, you will not see them in lsblk. Use "kubectl describe pv PV_NAME" to understand where they are located.
Related
I understand that a PV is the physical storage for a k8s cluster and that a PVC is just a request for storage tied to a deployment/pod that will look at available PVs and claim one.
Where I'm confused is how/if a mount will rebind to the PV if the deployment is started up. Are there cases when, if I restart my pod, that a PVC will bind to a DIFFERENT PV? Will I lose my data that's mounted in the deployment or pod? Or does that bind happen when I deploy my PVC and then just remain static regardless of the state of the pod?
I haven't really found any documentation that spells this out so any clarification would be helpful!
...PVC will bind to a DIFFERENT PV?
To ensure your PVC always bind to the same PV, you can pre-bind the PVC/PV and the instruction is here.
Can we use kubernetes volumes for deployments? If yes than that will be mutliple pods sharing the same volume?
If that is possible then what happens when all the pods for the deployment are on different host machines?
Especially when using Amazon EBS where an ebs volume cannot be shared across multiple hosts.
Yes, you can use a persistent volume for deployments
Such a volume will be mounted to your desired location in all the pods
If you use EBS block storage, all your pods will need to be scheduled on the same node where you have attached your volume. This may not work if you have many replicas
You will have to use a network file storage, such as EFS, GlusterFS, Portworx, etc. with ReadWriteMany if you want your pods to be spun up on different nodes
EBS will give you the best performance with the aforementioned single node limitation
I've been working with Kubernetes for quite a while, but still often got confused about Volume, PersistentVolume and PersistemtVolumeClaim. It would be nice if someone could briefly summarize the difference of them.
Volume - For a pod to reference a storage that is external , it needs volume spec. This volume can be from configmap, from secrets, from persistantvolumeclaim, from hostpath etc
PeristentVolume - It is representation of a storage that is made avaliable. The plugins for cloud provider enable to create this resource.
PeristentVolumeClaim - This claims specific resources and if the persistent volume is avaliable in namespaces match the claim requirement, then claim get tied to that Peristentvolume
At this point this PVC/PV aren't used. Then in Pod spec, pod makes use of claim as volumes and then the storage is attached to Pod
These are all in a Kubernetes application context. Too keep applications portable between different Kubernetes platforms, it is good to abstract away the infrastructure from the application. Here I will explain the Kubernetes objects that belongs to Application config and also to the Platform config. If your application runs on both e.g. GCP and AWS, you will need two sets of platform configs, one for GCP and one for AWS.
Application config
Volume
A pod may mount volumes. The source for volumes can be different things, e.g. a ConfigMap, Secret or a PersistentVolumeClaim
PersistentVolumeClaim
A PersistentVolumeClaim represents a claim of a specific PersistentVolume instance. For portability this claim can be for a specific StorageClass, e.g. SSD.
Platform config
StorageClass
A StorageClass represents PersistentVolume type with specific properties. It can be e.g. SSD. But the StorageClass is different on each platform, e.g. one definition on AWS, Azure, another on GCP or on Minikube.
PersistentVolume
This is a specific volume on the platform. And it may be different on platforms, e.g. awsElasticBlockStore or gcePersistentDisk. This is the instance that holds the actual data.
Minikube example
See Configure a Pod to Use a PersistentVolume for Storage for a full example on how to use PersistentVolume, StorageClass and Volume for a Pod using Minikube and a hostPath.
As per this official document, Kubernetes Persistent Volumes support three types of access modes.
ReadOnlyMany
ReadWriteOnce
ReadWriteMany
The given definitions of them in the document is very high-level. It would be great if someone can explain them in little more detail along with some examples of different use cases where we should use one vs other.
You should use ReadWriteX when you plan to have Pods that will need to write to the volume, and not only read data from the volume.
You should use XMany when you want the ability for Pods to access the given volume while those workloads are running on different nodes in the Kubernetes cluster. These Pods may be multiple replicas belonging to a Deployment, or may be completely different Pods. There are many cases where it's desirable to have Pods running on different nodes, for instance if you have multiple Pod replicas for a single Deployment, then having them run on different nodes can help ensure some amount of continued availability even if one of the nodes fails or is being updated.
If you don't use XMany, but you do have multiple Pods that need access to the given volume, that will force Kubernetes to schedule all those Pods to run on whatever node the volume gets mounted to first, which could overload that node if there are too many such pods, and can impact the availability of Deployments whose Pods need access to that volume as explained in the previous paragraph.
So putting all that together:
If you need to write to the volume, and you may have multiple Pods needing to write to the volume where you'd prefer the flexibility of those Pods being scheduled to different nodes, and ReadWriteMany is an option given the volume plugin for your K8s cluster, use ReadWriteMany.
If you need to write to the volume but either you don't have the requirement that multiple pods should be able to write to it, or ReadWriteMany simply isn't an available option for you, use ReadWriteOnce.
If you only need to read from the volume, and you may have multiple Pods needing to read from the volume where you'd prefer the flexibility of those Pods being scheduled to different nodes, and ReadOnlyMany is an option given the volume plugin for your K8s cluster, use ReadOnlyMany.
If you only need to read from the volume but either you don't have the requirement that multiple pods should be able to read from it, or ReadOnlyMany simply isn't an available option for you, use ReadWriteOnce. In this case, you want the volume to be read-only but the limitations of your volume plugin have forced you to choose ReadWriteOnce (there's no ReadOnlyOnce option). As a good practice, consider the containers.volumeMounts.readOnly setting to true in your Pod specs for volume mounts corresponding to volumes that are intended to be read-only.
ReadOnlyMany – the volume can be mounted read-only by many nodes
By this method, multiple pods running on multiple nodes can use a single volume and read data.
If a pod mounts a volume with ReadOnlyMany access mode, other pod can mount it and perform only read operation. Right now GCP is not supporting this method.
This means a volume can be mounted on one or many nodes of your Kubernetes cluster and you can only perform read operation.
You have one pod is running on node and you are reading stored file from volume. While on same volume you can not perform writes.
As it's ReadOnlyMany, if your pod is scheduled to another node, then also volume and data will be available to perform read operation.
ReadWriteMany – the volume can be mounted as read-write by many nodes
By this method, multiple pods running on multiple nodes can use a single volume and read/write data.
If a pod mounts a volume with ReadWriteMany access mode, other pod can also mount it.
This means the volume can be mounted on one or many node of your Kubernetes cluster and you can perform both, read and write operation.
You have one pod running on a node and you are reading & writing the stored file from the volume.
As it's ReadWriteMany if your pod schedule to another node then also the volume and data will be available there to perform read/write operation.
for this, you can use NFS (MinIO, GlusterFS) or EFS filesystem also.
ReadWriteOnce – the volume can be mounted as read-write by a single node
If a pod mounts a volume with ReadWriteOnce access mode, no other pod can mount it. In GCE (Google Compute Engine) the only allowed modes are ReadWriteOnce and ReadOnlyMany. So either one pod mounts the volume ReadWrite, or one or more pods mount the volume ReadOnlyMany.
This means the volume can be mounted on only one node of your kubernetes cluster and you can only perform read operation.
You have one pod running on node and you are reading stored file from volume. While on same volume you cannot perform writes.
As it's ReadWriteOnce if your pod is scheduled to another node then may mossible volume will be attached to the node and you can not get access of data there.
In Kubernetes you provision storage either statically(using a storage class) or dynamically (Persistent Volume). Once the storage is available to bound and claimed, you need to configure it in what way your Pods or Nodes are connecting to the storage (a persistent volume). That could be configured in below four modes.
ReadOnlyMany (ROX)
In this mode multiple pods running on different Nodes could connect to the storage and carry out read operation.
ReadWriteMany (RWX)
In this mode multiple pods running on different Nodes could connect to the storage and carry out read and write operation.
ReadWriteOnce (RWO)
In this mode multiple pods running in only one Node could connect to the storage and carry out read and write operation.
ReadWriteOncePod (RWOP)
In this mode the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access mode if you want to ensure that only one pod across whole cluster can read that PVC or write to it. This is only supported for CSI volumes and Kubernetes version 1.22+.
Folow the documentation to get more insight.
Its mentioned in kubernetes official website as below for PV and PVC.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
who is adminstrator here? when they mention it in persistent volume perspective?
An administrator in this context is the admin of the cluster. Whomever is deploying the PV/PVC. (An operations engineer, system engineer, SysAdmin)
For example - an engineer can configure AWS Elastic File System to have space available in the Kubernetes cluster, then use a PV/PVC to make that available to a specific pod container in the cluster. This means that if the pod is destroyed for whatever reason, the data in the PVC persists and is available to other resources.