What's a conceptual difference between PersistentVolume and PersistentVolumeClaim in kubernetes? - kubernetes

I know that PVC can be used as a volume in k8s. I know how to create them and how to use, but I couldn't understand why there are two of them, PV and PVC.
Can someone give me an architectural reason behind PV/PVC distinction? What kind of problem it try to solve (or what historical is behind this)?

Despite their names, they serve two different purposes: an abstraction for storage (PV) and a request for such storage (PVC). Together, they enable a clean separation of concerns (using a figure from our Kubernetes Cookbook here to illustrate this):
The storage admin focuses on provisioning PVs (ideally dynamically through defining storage classes) and the developer uses a PVC to acquire a PV and use it in a pod.

It is easy to be thrown by the names but the kubernetes documentation does have an explanation of the difference:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
And
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
So the PVC decouples the application from the specific storage. It allows the application to say that it needs some storage satisfying certain requirements without saying specifically which piece of storage that is. This also makes it possible for cluster-level rules to be defined on how the storage requirements of apps are to be met.

Related

What does Kubernetes AccessMode represent?

No matter how many times I read the documentation I just don't get it, so apologies for the really basic question.
I read that once a PersistentVolume is claimed, no other Pod can claim it - claims are exclusive.
However PV accessmodes have options including *Many. These two seem to contradict each other.
What is the Once or the Many in the access mode types? Does it refer to multiple replicas of the same pod across different nodes. Or does it mean after one claim has been released, can another pod then claim it? Or does it refer to the underlying storage which could be referenced by a different PV? Or something else?
I read that once a PersistentVolume is claimed, no other Pod can claim it - claims are exclusive.
This is a misunderstanding. It should be: once a PersistentVolume is claimed, no other PersistentVolumeClaim can claim it - claims are exclusive.
But multiple Pods can use the same PersistentVolumeClaim - it is not so common - but this is typically what happens when you "upgrade" your application, both new and old version of your app might use the PVC for a short time.
Access Modes
Access Modes on Persistent Volumes is related to how the volumes can be mounted on nodes. This is related to how your storage system works, so you must check what access modes is available for your storage system.
The modes ending with -Once can only be mounted on a single node at a time - this is unrelated to Pods. The mode ending with -Many can be mounted on multiple nodes at the same time, typical for NFS-style storage systems.

Kubernetes cluster Mysql Nodes Storage

We have started setting up a Kubernetes cluster. On Production, we have 4 Mysql Nodes(2 Active Master, 2 Active slaves). Complete servers are on-premise, There is NO cloud providers usage.
Now how do I configure storage? I mean should I use PV / PVC? How will it work. Should I use local PV? Can someone explain to me this?
You need to use PersistentVolumes and PersistentVolumeClaims in order to achieve that.
A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
Storage Classes.
A PersistentVolumeClaim (PVC) is a request for storage by a user.
Claims can request specific size and access modes (e.g., they can be
mounted once read/write or many times read-only).
Containers are ephemeral. When the container is restarted all the changes made prior to it are lost. Databases, however expect the data is persistent, therefore you need persistent volumes. You have to create a storage claim and the pod must be configured to mount the claimed storage.
Here you will find a simple guide showing how to deploy MySQL with a PersistentVolume. However, I strongly recommend getting familiar with the official docs that I have linked in order to fully understand the concept and adjust the access mode, class, size, etc according to your needs.
Please let me know if that helped.

Apache Kafka - Volume Mapping for Message Log files in Kubernetes (K8s)

When we deploy apache kafka on Linux/Windows, we have log.dirs and broker.id properties. on bare metal, the files are saved on the individual host instances. However, when deployed via K8s on public cloud - there must be some form of volume mounting to make sure that the transaction log fils are saved somewhere?
Has anyone done this on K8s? I am not referring to Confluent (because it's a paid subscription).
As far as I understand you are just asking how to deal with storage in Kubernetes.
Here is a great clip that talks about Kubernetes Storage that I would recommend to You.
In Kubernetes you are using Volumes
On-disk files in a Container are ephemeral, which presents some problems for non-trivial applications when running in Containers. First, when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. The Kubernetes Volume abstraction solves both of these problems.
There is many types of Volumes, some are cloud specific like awsElasticBlockStore, gcePersistentDisk, azureDisk and azureFile.
There are also other types like glusterfs, iscsi, nfs and many more that are listed here.
You can also use Persistent Volumes which provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
Here is a link to Portworx Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS, GKE and AKS which might be handy for you as well.
And if you would be interested in performance then Kubernetes Storage Performance Comparison is a great 10min read.
I hope those materials will help you understand Kubernetes storage.

What is the difference between a volume and persistent volume?

I've previously used both types, I've also read through the docs at:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://kubernetes.io/docs/concepts/storage/volumes/
However it's still not clear what the difference is, both seem to support the same storage types, the only thing that comes to mind is there seems to be a 'provisioning' aspect to persistent volumes.
What is the practical difference?
Are there advantages / disadvantages between the two - or for what use case would one be better suited to than the other?
Is it perhaps just 'synctactic sugar'?
For example NFS could be mounted as a volume, or a persistent volume. Both require a NFS server, both will have it's data 'persisted' between mounts. What difference would be had in this situation?
Volume decouples the storage from the Container. Its lifecycle is coupled to a pod. It enables safe container restarts and sharing data between containers in a pod.
Persistent Volume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
A volume exists in the context of a pod, that is, you can't create a volume on its own. A persistent volume on the other hand is a first class object with its own lifecycle, which you can either manage manually or automatically.
The way I understand it is that the concept of a Persistent Volumes builds on that of a Volume and that the difference is that a Persistent Volume is more decoupled from Pods using it. Or as expressed in the introduction of the documentation page about Persistent Volumes:
PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
A Volume's lifecycle on the other hand depends on the lifecycle of the Pod using it:
A Kubernetes volume [...] has an explicit lifetime - the same as the Pod that encloses it.
NFS is not really relevant here. Both Volumes and Persistent Volumes are Kubernetes resources. They provide an abstraction of a data storage facility. So for using the cluster, it doesn't matter which concrete operating system resource is behind that abstraction. That's in a way the whole point of Kubernetes.
It might also be relevant here to keep in mind that Kubernetes and its API are still evolving. The Kubernetes developers might sometimes choose to introduce new concepts/resources that differ only subtly from existing ones. I presume one reason for this is to maintain backwards compatibility while still being able to fine tune basic API concepts. Another example for this are Replication Controllers and Replica Sets, which conceptually largely overlap and are therefore redundant to some extent. Although, what's different to the Volume/Persitent Volume matter is that Replication Controllers are explicitly deprecated now.
Volumes ≠ Persistent Volumes
Volumes and Persistent Volumes are related, but very different!
Volumes:
appear in Pod specifications
do not exist as API resources (cannot do kubectl get volumes)
Persistent Volumes:
are API resources (can do kubectl get persistentvolumes)
correspond to concrete volumes (e.g. on a SAN, EBS, etc.)
cannot be associated with a Pod directly
(they need a Persistent Volume Claim)
They are two different implementations which can provide some similar common functionality (hence a lot of confusion).
Persistent volumes:
Support storage provisioned via StorageClass
Does not support emptyDir volume type (https://github.com/kubernetes/kubernetes/issues/75378)
Volumes:
Are bound to a pod
Are simpler to define (less Kubernetes resources required)

What is the difference between persistent volume (PV) and persistent volume claim (PVC) in simple terms?

What is the difference between persistent volume (PV) and persistent volume claim (PVC) in Kubernetes/ Openshift by referring to documentation?
What is the difference between both in simple terms?
From the docs
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource.
So a persistent volume (PV) is the "physical" volume on the host machine that stores your persistent data. A persistent volume claim (PVC) is a request for the platform to create a PV for you, and you attach PVs to your pods via a PVC.
Something akin to
Pod -> PVC -> PV -> Host machine
PVC is a declaration of need for storage that can at some point become available / satisfied - as in bound to some actual PV.
It is a bit like the asynchronous programming concept of a promise. PVC promises that it will at some point "translate" into storage volume that your application will be able to use, and one of defined characteristics like class, size, and access mode (ROX, RWO, and RWX).
This is a way to abstract thinking about a particular storage implementation away from your pods/deployments. Your application in most cases does not need to declare "give me NFS storage from server X of size Y"; it is more like "I need persistent storage of default class and size Y".
With this, deployments on different clusters can choose to differently satisfy that need. One can link an EBS device, another can provision a GlusterFS, and your core manifests are still the same in both cases.
Furthermore, you can have Volume Claim Templates defined in your deployment, so that each pod gets a reflecting PVC created automatically (i.e., supporting infrastructure-agnostic storage definition for a group of scalable pods where each needs its own dedicated storage).
Short:
- Here you have the storage! PersistentVolume (PV)
- You get the storage if you really need it! PersistentVolumeClaim (PVC)
A PersistentVolume (PV) is a piece of storage in the cluster or central storage let's say 100GB.
A PersistentVolumeClaim (PVC) is a request for storage by a user for the application to use 10GB.
In real life scenario, PV is whole cake and PVC is piece of cake (But you can have a whole cake if there are no other people to eat (just like if there are no other application to use you can use whole PV )).
Short and Simple
Persistent Volume - Available storage let's say you have 100Gi
Persistent Volume Claim - You request from Persistent Volume, let's say you request 10Gi you'll get it but if you request 110Gi you won't get it.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by server/storage/cluster administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like node.
A PersistentVolumeClaim (PVC) is a request for storage by a user which can be attained from PV. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany.
A Persistent Volume Claim is telling you what options you have access to in a particular cluster and they got this circular at this store called Smart Tech with some ads about your configuration options, those ads are the Persistent Volume Claim.
Inside your config file you write out the different Persistent Volume Claims that you are going to have inside your cluster, kind of like your wish list to Santa, but of course you are going to go take that to the sales guy at Smart Tech when you are done.
So you write a config file that says there should a 600gb hard drive option available to all your clusters and a 1TB hard drive option as well.
When you choose one of these options of the Persistent Volume Claim you go and request that Kubernetes (the sales guy) goes and gets that option for you, the option you have chosen, Kubernetes has to look through these instances of storage options in the stock room that are readily available. These instances of hard drives can be used right away and they are considered statically provisioned because they are created ahead of time.
On the other hand, there is dynamically provisioned options that were created on the fly, when you asked Kubernetes the sales guy, so kind of like just-in-time production, it got created when you immediately asked for it.
So the Persistent Volume Claim is the stores advertisement of options and whichever one you choose Kubernetes will go get it, either one in storage or create one on the fly.
The Persistent Volume is the actual product or options that you get back from Kubernetes that you asked for. If Kubernetes does not have what you asked for it will try to create it on the fly for you.
So the PVC is what Smart Tech is advertising they have to offer to your cluster which Kubernetes the sales guy will get for you and the PV is the actual finished product delivered to you.
PersistentVolume(PV) and PersistentVolumeClaim(PVC) are the resources APIs provided by the Kubernetes.
PV is a piece of storage which supposed to preallocated by an admin. And PVC is a request for a piece of storage by a user.
Persistent Volume — low level representation of a storage volume.
Persistent Volume Claim — binding between a Pod and Persistent Volume.
Storage Class — allows for dynamic provisioning of Persistent Volumes.
You can find some common when comparing PV and PVC with node and pods.
PV like a node, which defines the storage.
PVC like pods that requires the resources (Mem, CPU) and get them in case the node has the resources to allocate, which in this case it's a storage.