How to copy PVC between different storage classes? - kubernetes

I know about snapshots and tested volume cloning. And it works, when storage class is the same.
But what if I have two storage classes: one for fast ssd and second for cold storage hdd over network and I want periodically make backup to cold storage? How to do it?

This is not a thing Kubernetes supports since it would be entirely up to your underlying storage. The simple version would be a pod that mounts both and runs rsync I guess?

Cloning is supported with a different Storage Class
You need to use CSI Provisioning and apply something like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clone-of-pvc-1
namespace: myns
spec:
accessModes:
- ReadWriteOnce
storageClassName: cloning
resources:
requests:
storage: 5Gi
dataSource:
kind: PersistentVolumeClaim
name: pvc-1
Full documentation

Related

How can I mount Pv on one node and use that same PV for pods in another anode

I have attached an EBS volume to one of the nodes in my cluster and I want that whatever pod are coming up, irrespective of the nodes they are scheduled onto, should use that EBS volume. is this possible?
My approach was to create a PV/PVC that mounts to that volume and then use that PVC in my pod, but I am not sure if it's mounting to same host that pod comes up in or a different host.
YAML for Storage Class
kind: StorageClass
metadata:
name: local-path
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
allowVolumeExpansion: true
reclaimPolicy: Delete
PV.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-pv
labels:
type: local
spec:
capacity:
storage: 200Mi
storageClassName: local-path
claimRef:
namespace: redis
name: data-redis-0
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt2/data/redis"
PVC.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-redis-0
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Mi
storageClassName: local-path
no when i am trying to schedule a pod the storage is also getting mounted on the same node instead
you are using local path you can not do it.
There is a different type of AccessMount ReadWriteMany, ReadWriteOnce, and ReadyWriteOnly with PVC.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It
is similar to a Pod. Pods consume node resources and PVCs consume PV
resources. Pods can request specific levels of resources (CPU and
Memory). Claims can request specific size and access modes (e.g., they
can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see
AccessModes).
Read More at : https://kubernetes.io/docs/concepts/storage/persistent-volumes/
Yes you can mount the multiple PODs to a single PVC but in that case, you have to use the ReadWriteMany. Most people use the NFS or EFS for this type of use case.
EBS is ReadWriteOnce, so it won't be possible to use the EBS in your case. you have to either use NFS or EFS.
you can use GlusterFs in the back it will be provisioning EBS volume. GlusterFS support ReadWriteMany and it will be faster compared to EFS as it's block storage (SSD).
For ReadWiteMany you can also checkout : https://min.io/
Find access mode details here : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
I have attached an EBS volume to one of the nodes in my cluster and I want that whatever pod are coming up, irrespective of the nodes they are scheduled onto, should use that EBS volume. is this possible?
No. An EBS volume can only be attached to at most one EC2 instance, and correspondingly, one Kubernetes node. In Kubernetes terminology, it only allows the ReadWriteOnce access mode.
It looks like the volume you're trying to create is the backing store for a Redis instance. If the volume will only be attached to one pod at a time, then this isn't a problem on its own, but you need to let Kubernetes manage the volume for you. Then the cluster will know to detach the EBS volume from the node it's currently on and reattach it to the node with the new pod. Setting this up is a cluster-administration problem and not something you as a programmer can do, but it should be set up for you in environments like Amazon's EKS managed Kubernetes.
In this environment:
Don't create a StorageClass; this is cluster-level configuration.
Don't manually create a PersistentVolume; the cluster will create it for you.
You should be able to use the default storageClass: in your PersistentVolumeClaim.
You probably should use a StatefulSet to create the PersistentVolumeClaim for you.
So for example:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
volumeClaimTemplates: # automatically creates PersistentVolumeClaims
- metadata:
name: data-redis
spec:
accessModes: [ReadWriteOnce] # data won't be shared between pods
resources:
requests:
storage: 200Mi
# default storageClassName:
template:
spec:
containers:
- name: redis
volumeMounts:
- name: data-redis
mountPath: /data

Difference between NFS-PV, hostPath-PV on NFS and hostPath mount in deployment

I have a Kubernetes cluster setup (on-premise), that has an NFS share (my-nfs.internal.tld) mounted to /exports/backup on each node to create backups there.
Now I'm setting up my logging stack and I wanted to make the data persistent. So I figured I could start by storing the indices on the NFS.
Now I found three different ways to achieve this:
NFS-PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: logging-data
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
nfs:
server: my-nfs.internal.tld
path: /path/to/exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logging-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: logging-data
resources:
requests:
storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
persistentVolumeClaim:
claimName: logging-data-pvc
This would, of course, require, that my cluster gets access to the NFS (instead of only the nodes as it is currently setup).
hostPath-PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: logging-data
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logging-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: logging-data
resources:
requests:
storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
persistentVolumeClaim:
claimName: logging-data-pvc
hostPath mount in deployment
As the nfs is mounted to all my nodes, I could also just use the host path directly in the deployment without pinning anything.
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
hostPath:
path: /exports/backup/logging-data
type: DirectoryOrCreate
So my question is: Is there really any difference between these three? I'm pretty sure all three work. I tested the second and third already. I was not yet able to test the first though (in this specific setup at least). Especially the second and third solutions seem very similar to me. The second makes it easier to re-use deployment files on multiple clusters, I think, as you can use persistent volumes of different types without changing the volumes part of the deployment. But is there any difference beyond that? Performance maybe? Or is one of them deprecated and will be removed soon?
I found a tutorial mentioning, that the hostPath-PV only works on single-node clusters. But I'm sure it does also works in my case here. Maybe the comment was about: "On multi-node clusters the data changes when deployed to different nodes."
From reading to a lot of documentation and How-To's I understand, that the first one is the preferred solution. I would probably also go for it as it is the one easiest replicated to a cloud setup. But I do not really understand why this is preferred to the other two.
Thanks in advance for your input on the matter!
The NFS is indeed the preferred solution:
An nfs volume allows an existing NFS (Network File System) share to
be mounted into a Pod. Unlike emptyDir, which is erased when a Pod
is removed, the contents of an nfs volume are preserved and the
volume is merely unmounted. This means that an NFS volume can be
pre-populated with data, and that data can be shared between pods. NFS
can be mounted by multiple writers simultaneously.
So, an NFS is useful for two reasons:
Data is persistent.
It can be accessed from multiple pods at the same time and the data can be shared between pods.
See the NFS example for more details.
While the hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
Pods with identical configuration (such as created from a PodTemplate)
may behave differently on different nodes due to different files on
the nodes
The files or directories created on the underlying hosts are only
writable by root. You either need to run your process as root in a
privileged Container or modify the file permissions on the host to be
able to write to a hostPath volume
hostPath is not recommended due to several reasons:
You don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
You expose your cluster to security threats.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
the hostPath would be good if for example you would like to use it for log collector running in a DaemonSet. Other than that, it would be better to use the NFS.

Kubernetes - How do I mention hostPath in PVC?

I need to make use of PVC to specify the specs of the PV and I also need to make sure it uses a custom local storage path in the PV.
I am unable to figure out how to mention the hostpath in a PVC?
This is the PVC config:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
And this is the mongodb deployment:
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
volumes:
- name: mongo-volume
persistentVolumeClaim:
claimName: mongo-pvc
containers:
- name: mongo
image: mongo
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-volume
mountPath: /data/db
How and where do I mention the hostPath to be mounted in here?
Doc says that you set hostPath when creating a PV (the step before creating PVC).
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
After you create the PersistentVolumeClaim, the Kubernetes control plane looks for a PersistentVolume that satisfies the claim's requirements. If the control plane finds a suitable PersistentVolume with the same StorageClass, it binds the claim to the volume.
Please see https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/
You don't (and can't) force a specific host path in a PersistentVolumeClaim.
Typically a Kubernetes cluster will be configured with a dynamic volume provisioner and that will create the matching PersistentVolume for you. Depending on how your cluster was installed that could be an Amazon EBS volume, a Google Cloud Platform persistent disk, an iSCSI volume, or some other type of storage; as an application author you don't really control that. (You tagged this question for GKE, and the GKE documentation has a section on dynamic volume provisioning.) You don't need to specify where on the host the volume might be mounted, and there's no way to provide this detail in the PersistentVolumeClaim.
With the YAML you show, and the context of this being on GKE, I'd expect Google to automatically provision a GCE persistent disk. If the pod gets rescheduled on a different node, the persistent disk will follow the pod to the new node. You don't need to worry about what specific host directory is being used; Kubernetes will manage this for you.
In most cases you'll want to avoid hostPath storage. You don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume. It's appropriate for something like a log collector running in a DaemonSet, where you can guarantee that there is interesting content in that path on every node, but not for your general application database storage.

What is the maximum storage capacity for kubernetes PersistentVolumes

Here is an example of .yml file to create an PersistentVolume on a kubernetes cluster:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
namespace: prisma
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: xxGi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
Can the storage capacity be more than the available storage capacity on the node with the smallest disk in the cluster? Or the maximum is the sum of available disk on the cluster nodes ?
generally you are binding the pv to an external storage volume your cloud provider offers (for example - aws EBS), abstracted as a StorageClass, in a size that matches your needs. cluster nodes come and go, you shouldn't rely on their storage.
quick guides: gcp aws azure
The hostPath mode is intended only for local testing so the requested size does absolutely nothing I'm pretty sure.

Adding a Compute Engine Disk to Container Engine as persistent volume

I have a PersistentVolumeClaim that looks like the following:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gitlab-config-storage
namespace: gitlab
annotations:
volume.beta.kubernetes.io/storage-class: fast
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
This created a Disk in Google Compute Engine, I then deleted the claim and reapplied it, but this created a new Disk, I would like to attach the original Disk to my claim as this had data in it I've already created, is there a way to force GKE to use a specific Disk?
By using a persistent volume claim, you are asking GKE to use a persistent disk, and then always use the same volume.
However, by deleting the claim, you've essentially destroyed it.
Don't delete the claim, ever, if you want to continue using it.
You can attach a claim to a multiple pods over its lifetime, and the disk will remain the same. As soon as you delete the claim, it will disappear.
Take a look here for more in.formation
You can re-attach a GCE disk to a PersistantVolumeClaim by first creating the PersistantVolume. Create a yaml file and set the proper values, e.g.:
---
apiVersion: v1
kind: PersistentVolume
name: pvc-gitlab-config-storage
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 25Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: gitlab-config-storage
namespace: gitlab
gcePersistentDisk:
pdName: <name_of_the_gke_disk>
persistentVolumeReclaimPolicy: Delete
storageClassName: fast
Create this with kubectl apply -f filename.yaml and then re-create your PersistantVolumeClaim with matching values to the spec and claimRef. The PVC should find the matching PV and bind to it & the existing GCE disk.