I've previously used both types, I've also read through the docs at:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://kubernetes.io/docs/concepts/storage/volumes/
However it's still not clear what the difference is, both seem to support the same storage types, the only thing that comes to mind is there seems to be a 'provisioning' aspect to persistent volumes.
What is the practical difference?
Are there advantages / disadvantages between the two - or for what use case would one be better suited to than the other?
Is it perhaps just 'synctactic sugar'?
For example NFS could be mounted as a volume, or a persistent volume. Both require a NFS server, both will have it's data 'persisted' between mounts. What difference would be had in this situation?
Volume decouples the storage from the Container. Its lifecycle is coupled to a pod. It enables safe container restarts and sharing data between containers in a pod.
Persistent Volume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
A volume exists in the context of a pod, that is, you can't create a volume on its own. A persistent volume on the other hand is a first class object with its own lifecycle, which you can either manage manually or automatically.
The way I understand it is that the concept of a Persistent Volumes builds on that of a Volume and that the difference is that a Persistent Volume is more decoupled from Pods using it. Or as expressed in the introduction of the documentation page about Persistent Volumes:
PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
A Volume's lifecycle on the other hand depends on the lifecycle of the Pod using it:
A Kubernetes volume [...] has an explicit lifetime - the same as the Pod that encloses it.
NFS is not really relevant here. Both Volumes and Persistent Volumes are Kubernetes resources. They provide an abstraction of a data storage facility. So for using the cluster, it doesn't matter which concrete operating system resource is behind that abstraction. That's in a way the whole point of Kubernetes.
It might also be relevant here to keep in mind that Kubernetes and its API are still evolving. The Kubernetes developers might sometimes choose to introduce new concepts/resources that differ only subtly from existing ones. I presume one reason for this is to maintain backwards compatibility while still being able to fine tune basic API concepts. Another example for this are Replication Controllers and Replica Sets, which conceptually largely overlap and are therefore redundant to some extent. Although, what's different to the Volume/Persitent Volume matter is that Replication Controllers are explicitly deprecated now.
Volumes ≠ Persistent Volumes
Volumes and Persistent Volumes are related, but very different!
Volumes:
appear in Pod specifications
do not exist as API resources (cannot do kubectl get volumes)
Persistent Volumes:
are API resources (can do kubectl get persistentvolumes)
correspond to concrete volumes (e.g. on a SAN, EBS, etc.)
cannot be associated with a Pod directly
(they need a Persistent Volume Claim)
They are two different implementations which can provide some similar common functionality (hence a lot of confusion).
Persistent volumes:
Support storage provisioned via StorageClass
Does not support emptyDir volume type (https://github.com/kubernetes/kubernetes/issues/75378)
Volumes:
Are bound to a pod
Are simpler to define (less Kubernetes resources required)
Related
To use storage inside Kubernetes PODs I can use volumes and persistent volumes. While the volumes like emptyDir are ephemeral, I could use hostPath and many other cloud based volume plugins which would provide a persistent solution in volumes itself.
In that case why should I be using Persistent Volume then?
It is very important to understand the main differences between Volumes and PersistentVolumes. Both Volumes and PersistentVolumes are Kubernetes resources which provides an abstraction of a data storage facility.
Volumes: let your pod write to a filesystem that exists as long as the pod exists. They also let you share data between containers in the same pod but data in that volume will be destroyed when the pod is restarted. Volume decouples the storage from the Container. Its lifecycle is coupled to a pod.
PersistentVolumes: serves as a long-term storage in your Kubernetes cluster. They exist beyond containers, pods, and nodes. A pod uses a persistent volume claim to to get read and write access to the persistent volume. PersistentVolume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
When it comes to hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
hostPath has its usage scenarios but in general it might not recommended due to several reasons:
Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume
You don't always directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet.
I recommend the Kubernetes Volumes Guide as a nice supplement to this topic.
PersistentVoluemes is cluster-wide storage and allows you to manage the storage more centrally.
When you configure a volume (either using hostPath or any of the cloud-based volume plugins) then you need to do this configuration within the POD definition file. Every configuration information, required to configure storage for the volume, goes within the POD definition file.
When you have a large environment with a lot of users and a large number of PODs then users will have to configure storage every time for each POD they deploy. Whatever storage solution is used, the user who deploys the POD will have to configure that storage on all of his/her POD definition files. If a change needs to be made then the user will have to make this change on all of his/her PODs. After a certain scale, this is not the most optimal way to manage storage.
Instead, you would like to manage this centrally. You would like to manage the storage in such a way that an Administrator can create a large pool of storage and users can carve out a part of this storage as required, and this is exactly what you can do using PersistentVolumes and PersistentVolumeClaims.
Use PersistentVolumes when you need to set up a database like MongoDB, Redis, Postgres & MySQL. Because it's long-term storage and not deeply coupled with your pods! Perfect for database applications. Because they will not die with the pods.
Avoid Volumes when you need long-term storage. Because they will die with the pods!
In my case, when I have to store something, I will always go for persistent volumes!
I have a GKE cluster, with almost 6-7 micro-services deployed. I need a Postgres DB to be installed inside GKE (not Cloudsql as cost). When checked the different types of persistent volumes i can see that if multiple micro-service accessing the same DB, should i go using NFS or PVC with normal disk would be enough not anyway local storage.
Request your thought on this.
Everything depends from your scenario. In general you should follow AccessMode when you are considering which Volume Plugin you want to use.
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume.
In this documentation below, you will find table with different Volume Plugins and supported Access Modes.
According to update form your comment, you have only one node. With that setup, you can use almost every Volume which supports RWO Access mode.
ReadWriteOnce -- the volume can be mounted as read-write by a single node.
There are 2 other Access Modes which should be consider if would like to use it for more than 1 node.
ReadOnlyMany -- the volume can be mounted read-only by many nodes
ReadWriteMany -- the volume can be mounted as read-write by many nodes
So in your case you can use gcePersistentDisk as it supports (ReadWriteOnce and ReadOnlyMany).
Using NFS would benefit if you would like to access this PV from many nodes.
NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
Just as addition, if this is for learning puropse, you can also check Local Persistent Volume. Example can be found in this tutorial, however it would require few updates like image or apiVersion.
I am currently looking into setting up Kubernetes pods for a project on GCP.
The problem - I need to set a persistent shared volume which will be used by multiple nodes. I need all nodes to be able to read from the volume and only one node must be able to write on the volume. So I need some advice what's the best way to achieve that?
I have checked the Kubernetes documentation and know that GCEPersistentDisks does not support ReadWriteMany but anyway this access mode I think will be an overkill. Regarding the ReadOnlyMany I get that nodes can read from the PV but I don't understand how or what can actually modify the PV in this case. Currently my best bet is setting up NFS with GCE persistent disk.
Also the solution should be able to run on the cloud or on premise. Any advice will be appreciated :)
According to official documentation:
A PVC to PV binding is a one-to-one mapping.
A volume can only be mounted using one access mode at a time, even if
it supports many. For example, a GCEPersistentDisk can be mounted as
ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not
at the same time.
So I am afraid that this would be impossible to do it how you described.
However, you may want to try task queue
Please let me know if that helped.
Assuming there is some NFS space availalbe, it could be possible to create two persistent volume claims (PVCs) for it: one readonly, one readwrite.
Then you could have two persistent volumes binding one to one to each of the PVCs.
Now create two deployments. One of them describes the pod with the writing application, and you ensure you have one replica. The other deployment describes the reading application, and you can scale it to whatever amount you like.
I know that PVC can be used as a volume in k8s. I know how to create them and how to use, but I couldn't understand why there are two of them, PV and PVC.
Can someone give me an architectural reason behind PV/PVC distinction? What kind of problem it try to solve (or what historical is behind this)?
Despite their names, they serve two different purposes: an abstraction for storage (PV) and a request for such storage (PVC). Together, they enable a clean separation of concerns (using a figure from our Kubernetes Cookbook here to illustrate this):
The storage admin focuses on provisioning PVs (ideally dynamically through defining storage classes) and the developer uses a PVC to acquire a PV and use it in a pod.
It is easy to be thrown by the names but the kubernetes documentation does have an explanation of the difference:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
And
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
So the PVC decouples the application from the specific storage. It allows the application to say that it needs some storage satisfying certain requirements without saying specifically which piece of storage that is. This also makes it possible for cluster-level rules to be defined on how the storage requirements of apps are to be met.
What is the difference between persistent volume (PV) and persistent volume claim (PVC) in Kubernetes/ Openshift by referring to documentation?
What is the difference between both in simple terms?
From the docs
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource.
So a persistent volume (PV) is the "physical" volume on the host machine that stores your persistent data. A persistent volume claim (PVC) is a request for the platform to create a PV for you, and you attach PVs to your pods via a PVC.
Something akin to
Pod -> PVC -> PV -> Host machine
PVC is a declaration of need for storage that can at some point become available / satisfied - as in bound to some actual PV.
It is a bit like the asynchronous programming concept of a promise. PVC promises that it will at some point "translate" into storage volume that your application will be able to use, and one of defined characteristics like class, size, and access mode (ROX, RWO, and RWX).
This is a way to abstract thinking about a particular storage implementation away from your pods/deployments. Your application in most cases does not need to declare "give me NFS storage from server X of size Y"; it is more like "I need persistent storage of default class and size Y".
With this, deployments on different clusters can choose to differently satisfy that need. One can link an EBS device, another can provision a GlusterFS, and your core manifests are still the same in both cases.
Furthermore, you can have Volume Claim Templates defined in your deployment, so that each pod gets a reflecting PVC created automatically (i.e., supporting infrastructure-agnostic storage definition for a group of scalable pods where each needs its own dedicated storage).
Short:
- Here you have the storage! PersistentVolume (PV)
- You get the storage if you really need it! PersistentVolumeClaim (PVC)
A PersistentVolume (PV) is a piece of storage in the cluster or central storage let's say 100GB.
A PersistentVolumeClaim (PVC) is a request for storage by a user for the application to use 10GB.
In real life scenario, PV is whole cake and PVC is piece of cake (But you can have a whole cake if there are no other people to eat (just like if there are no other application to use you can use whole PV )).
Short and Simple
Persistent Volume - Available storage let's say you have 100Gi
Persistent Volume Claim - You request from Persistent Volume, let's say you request 10Gi you'll get it but if you request 110Gi you won't get it.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by server/storage/cluster administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like node.
A PersistentVolumeClaim (PVC) is a request for storage by a user which can be attained from PV. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany.
A Persistent Volume Claim is telling you what options you have access to in a particular cluster and they got this circular at this store called Smart Tech with some ads about your configuration options, those ads are the Persistent Volume Claim.
Inside your config file you write out the different Persistent Volume Claims that you are going to have inside your cluster, kind of like your wish list to Santa, but of course you are going to go take that to the sales guy at Smart Tech when you are done.
So you write a config file that says there should a 600gb hard drive option available to all your clusters and a 1TB hard drive option as well.
When you choose one of these options of the Persistent Volume Claim you go and request that Kubernetes (the sales guy) goes and gets that option for you, the option you have chosen, Kubernetes has to look through these instances of storage options in the stock room that are readily available. These instances of hard drives can be used right away and they are considered statically provisioned because they are created ahead of time.
On the other hand, there is dynamically provisioned options that were created on the fly, when you asked Kubernetes the sales guy, so kind of like just-in-time production, it got created when you immediately asked for it.
So the Persistent Volume Claim is the stores advertisement of options and whichever one you choose Kubernetes will go get it, either one in storage or create one on the fly.
The Persistent Volume is the actual product or options that you get back from Kubernetes that you asked for. If Kubernetes does not have what you asked for it will try to create it on the fly for you.
So the PVC is what Smart Tech is advertising they have to offer to your cluster which Kubernetes the sales guy will get for you and the PV is the actual finished product delivered to you.
PersistentVolume(PV) and PersistentVolumeClaim(PVC) are the resources APIs provided by the Kubernetes.
PV is a piece of storage which supposed to preallocated by an admin. And PVC is a request for a piece of storage by a user.
Persistent Volume — low level representation of a storage volume.
Persistent Volume Claim — binding between a Pod and Persistent Volume.
Storage Class — allows for dynamic provisioning of Persistent Volumes.
You can find some common when comparing PV and PVC with node and pods.
PV like a node, which defines the storage.
PVC like pods that requires the resources (Mem, CPU) and get them in case the node has the resources to allocate, which in this case it's a storage.