What is the maximum storage capacity for kubernetes PersistentVolumes - kubernetes

Here is an example of .yml file to create an PersistentVolume on a kubernetes cluster:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
namespace: prisma
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: xxGi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
Can the storage capacity be more than the available storage capacity on the node with the smallest disk in the cluster? Or the maximum is the sum of available disk on the cluster nodes ?

generally you are binding the pv to an external storage volume your cloud provider offers (for example - aws EBS), abstracted as a StorageClass, in a size that matches your needs. cluster nodes come and go, you shouldn't rely on their storage.
quick guides: gcp aws azure

The hostPath mode is intended only for local testing so the requested size does absolutely nothing I'm pretty sure.

Related

AzureFile Persistent volume performance is too slow

We are using this AKS cluster to host our Azuredevops build agents as docker containers. We followed the Microsoft documents We followed this link to https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/docker?view=azure-devops to setup the ADO agents, to set this agents ready, however we are facing some performance and stability issue with the ADO agent usages.
We referred the MS Document to setup fileshare based Persistent Volume to use across multiple pods of aks agents and pointed this PV volume as maven and node cached repository for the Builds. But the builds are much slower than the normal (4X times slower) . We are using Storage account [Standard Geo-redundant storage (GRS)]fileshare with Private endpoint.
But When we used the Azure-disk as Persistent volume,, the builds are faster. But Disk based PVs cant be mount across multiple nodes . So why this performance issue is happening for fileshare based PV and what will be the recommended solution?
Or can we have the Azuredisk shared between multiple nodes ?
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: file.csi.azure.com
readOnly: false
volumeHandle: unique-volumeid # make sure this volumeid is unique in the cluster
volumeAttributes:
resourceGroup: my-rg
shareName: aksshare
nodeStageSecretRef:
name: azure-secret
namespace: ado
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict
- nosharesock
- nobrl
#############################
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: my-pv
resources:
requests:
storage: 100Gi
#############################
apiVersion: v1
data:
azurestorageaccountkey: ''
azurestorageaccountname: ''
kind: Secret
metadata:
name: azure-secret
namespace: aks
type: Opaque
I suggest using Premium file shares. Their performance is much better than the standard tier.
If you are using the out of box storage classes, then use the "azurefile-csi-premium" storage class.
If you are using your own storage class, then add the following to the end of the storage class definition (After creating the premium share):
parameters:
skuName: Premium_LRS
References:
Azure Files scalability and performance targets
Azure File Dynamic
Azure File Static

How can I mount Pv on one node and use that same PV for pods in another anode

I have attached an EBS volume to one of the nodes in my cluster and I want that whatever pod are coming up, irrespective of the nodes they are scheduled onto, should use that EBS volume. is this possible?
My approach was to create a PV/PVC that mounts to that volume and then use that PVC in my pod, but I am not sure if it's mounting to same host that pod comes up in or a different host.
YAML for Storage Class
kind: StorageClass
metadata:
name: local-path
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
allowVolumeExpansion: true
reclaimPolicy: Delete
PV.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-pv
labels:
type: local
spec:
capacity:
storage: 200Mi
storageClassName: local-path
claimRef:
namespace: redis
name: data-redis-0
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt2/data/redis"
PVC.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-redis-0
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Mi
storageClassName: local-path
no when i am trying to schedule a pod the storage is also getting mounted on the same node instead
you are using local path you can not do it.
There is a different type of AccessMount ReadWriteMany, ReadWriteOnce, and ReadyWriteOnly with PVC.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It
is similar to a Pod. Pods consume node resources and PVCs consume PV
resources. Pods can request specific levels of resources (CPU and
Memory). Claims can request specific size and access modes (e.g., they
can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see
AccessModes).
Read More at : https://kubernetes.io/docs/concepts/storage/persistent-volumes/
Yes you can mount the multiple PODs to a single PVC but in that case, you have to use the ReadWriteMany. Most people use the NFS or EFS for this type of use case.
EBS is ReadWriteOnce, so it won't be possible to use the EBS in your case. you have to either use NFS or EFS.
you can use GlusterFs in the back it will be provisioning EBS volume. GlusterFS support ReadWriteMany and it will be faster compared to EFS as it's block storage (SSD).
For ReadWiteMany you can also checkout : https://min.io/
Find access mode details here : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
I have attached an EBS volume to one of the nodes in my cluster and I want that whatever pod are coming up, irrespective of the nodes they are scheduled onto, should use that EBS volume. is this possible?
No. An EBS volume can only be attached to at most one EC2 instance, and correspondingly, one Kubernetes node. In Kubernetes terminology, it only allows the ReadWriteOnce access mode.
It looks like the volume you're trying to create is the backing store for a Redis instance. If the volume will only be attached to one pod at a time, then this isn't a problem on its own, but you need to let Kubernetes manage the volume for you. Then the cluster will know to detach the EBS volume from the node it's currently on and reattach it to the node with the new pod. Setting this up is a cluster-administration problem and not something you as a programmer can do, but it should be set up for you in environments like Amazon's EKS managed Kubernetes.
In this environment:
Don't create a StorageClass; this is cluster-level configuration.
Don't manually create a PersistentVolume; the cluster will create it for you.
You should be able to use the default storageClass: in your PersistentVolumeClaim.
You probably should use a StatefulSet to create the PersistentVolumeClaim for you.
So for example:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
volumeClaimTemplates: # automatically creates PersistentVolumeClaims
- metadata:
name: data-redis
spec:
accessModes: [ReadWriteOnce] # data won't be shared between pods
resources:
requests:
storage: 200Mi
# default storageClassName:
template:
spec:
containers:
- name: redis
volumeMounts:
- name: data-redis
mountPath: /data

How to copy PVC between different storage classes?

I know about snapshots and tested volume cloning. And it works, when storage class is the same.
But what if I have two storage classes: one for fast ssd and second for cold storage hdd over network and I want periodically make backup to cold storage? How to do it?
This is not a thing Kubernetes supports since it would be entirely up to your underlying storage. The simple version would be a pod that mounts both and runs rsync I guess?
Cloning is supported with a different Storage Class
You need to use CSI Provisioning and apply something like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clone-of-pvc-1
namespace: myns
spec:
accessModes:
- ReadWriteOnce
storageClassName: cloning
resources:
requests:
storage: 5Gi
dataSource:
kind: PersistentVolumeClaim
name: pvc-1
Full documentation

kubernetes storage class node selector

I'm trying to leverage a local volume dynamic provisioner for k8s, Rancher's one, with multiple instances, each with its own storage class so that I can provide multiple types of local volumes based on their performance (e.g. ssd, hdd ,etc).
The underlying infrastructure is not symmetric; some nodes only have ssds, some only hdds, some of them both.
I know that I can hint the scheduler to select the proper nodes by providing node affinity rules for pods.
But, is there a better way to address this problem at the level of provisioner / storage class only ? E.g., make a storage class only available for a subset of the cluster nodes.
I know that I can hint the scheduler to select the proper nodes by
providing node affinity rules for pods.
There is no need to define node affinity rules on Pod level when using local persistent volumes. Node affinity can be specified in PersistentVolume definition.
But, is there a better way to address this problem at the level of
provisioner / storage class only ? E.g., make a storage class only
available for a subset of the cluster nodes.
No, it cannot be specified on a StorageClass level. Neither you can make a StorageClass available only for a subset of nodes.
But when it comes to a provisioner, I would say yes, it should be feasible as one of the major storage provisioner tasks is creating matching PersistentVolume objects in response to PersistentVolumeClaim created by the user. You can read about it here:
Dynamic volume provisioning allows storage volumes to be created
on-demand. Without dynamic provisioning, cluster administrators have
to manually make calls to their cloud or storage provider to create
new storage volumes, and then create PersistentVolume objects to
represent them in Kubernetes. The dynamic provisioning feature
eliminates the need for cluster administrators to pre-provision
storage. Instead, it automatically provisions storage when it is
requested by users.
So looking at the whole volume provision process from the very beginning it looks as follows:
User creates only PersistenVolumeClaim object, where he specifies a StorageClass:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
storageClassName: local-storage ### 👈
and it can be used in a Pod definition:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim ### 👈
So in practice, in a Pod definition you need only to specify the proper PVC. No need for defining any node-affinity rules here.
A Pod references a PVC, PVC then references a StorageClass, StorageClass references the provisioner that should be used:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/my-fancy-provisioner ### 👈
volumeBindingMode: WaitForFirstConsumer
So in the end it is the task of a provisioner to create matching PersistentVolume object. It can look as follows:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /var/tmp/test
nodeAffinity: ### 👈
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ssd-node ### 👈
So a Pod which uses myclaim PVC -> which references the local-storage StorageClass -> which selects a proper storage provisioner will be automatically scheduled on the node selected in PV definition created by this provisioner.

Kubernetes - How do I mention hostPath in PVC?

I need to make use of PVC to specify the specs of the PV and I also need to make sure it uses a custom local storage path in the PV.
I am unable to figure out how to mention the hostpath in a PVC?
This is the PVC config:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
And this is the mongodb deployment:
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
volumes:
- name: mongo-volume
persistentVolumeClaim:
claimName: mongo-pvc
containers:
- name: mongo
image: mongo
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-volume
mountPath: /data/db
How and where do I mention the hostPath to be mounted in here?
Doc says that you set hostPath when creating a PV (the step before creating PVC).
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
After you create the PersistentVolumeClaim, the Kubernetes control plane looks for a PersistentVolume that satisfies the claim's requirements. If the control plane finds a suitable PersistentVolume with the same StorageClass, it binds the claim to the volume.
Please see https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/
You don't (and can't) force a specific host path in a PersistentVolumeClaim.
Typically a Kubernetes cluster will be configured with a dynamic volume provisioner and that will create the matching PersistentVolume for you. Depending on how your cluster was installed that could be an Amazon EBS volume, a Google Cloud Platform persistent disk, an iSCSI volume, or some other type of storage; as an application author you don't really control that. (You tagged this question for GKE, and the GKE documentation has a section on dynamic volume provisioning.) You don't need to specify where on the host the volume might be mounted, and there's no way to provide this detail in the PersistentVolumeClaim.
With the YAML you show, and the context of this being on GKE, I'd expect Google to automatically provision a GCE persistent disk. If the pod gets rescheduled on a different node, the persistent disk will follow the pod to the new node. You don't need to worry about what specific host directory is being used; Kubernetes will manage this for you.
In most cases you'll want to avoid hostPath storage. You don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume. It's appropriate for something like a log collector running in a DaemonSet, where you can guarantee that there is interesting content in that path on every node, but not for your general application database storage.