I'm using kubernetes v1.16.10 with a Ceph 13.2.2 Mimic cluster for dynamic volume provisioning through ceph-csi.
But then I have found ceph-rbd
Ceph RBD (kubernetes.io/rbd)
https://kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd
According to:
Ceph CSI (rbd.csi.ceph.com)
https://docs.ceph.com/docs/master/rbd/rbd-kubernetes/#block-devices-and-kubernetes
You may use Ceph Block Device images with Kubernetes v1.13 and later through ceph-csi, which dynamically provisions RBD images to back Kubernetes volumes and maps these RBD images as block devices (optionally mounting a file system contained within the image) on worker nodes running pods that reference an RBD-backed volume.
So... which one should I use?
Advantages / disadvantages?
Thanks in advance.
I don't know the exact differences, but I was told from a Ceph CSI developer that the Ceph RBD (kubernetes.io/rbd) i.e. the in-tree driver will be deprecated in a few Kubernetes releases. And I don't have any references to any official documentation as this was a slack conversation.
So the CSI driver is the way forward and makes it more future proof.
Related
I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.
I'm teaching myself Kubernetes with a 5 Rpi cluster, and I'm a bit confused by the way Kubernetes treats Persistent Volumes with respect to Pod Scheduling.
I have 4 worker nodes using ext4 formatted 64GB micro SD cards. It's not going to give GCP or AWS a run for their money, but it's a side project.
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
Should I be looking into distributed file systems like Ceph or Hdfs so that Pods aren't restricted to being scheduled on a particular node?
Sorry if this seems like a stupid question, I'm self taught and still trying to figure this stuff out! (Feel free to improve my tl;dr doc for kubernetes with a pull req)
just some examples, as already mentioned it depends on your storage system, as i see you use the local storage option
Local Storage:
yes the pod needs to be run on the same machine where the pv is located (your case)
ISCSI/Trident San:
no, the node will mount the iscsi block device where the pod will be scheduled
(as mentioned already volume binding mode is an important keyword, its possible you need to set this to 'WaitForFirstConsumer')
NFS/Trident Nas:
no, its nfs, mountable from everywhere if you can access and auth against it
VMWare VMDK's:
no, same as iscsi, the node which gets the pod scheduled mounts the vmdk from the datastore
ceph/rook.io:
no, you get 3 options for storage, file, block an object storage, every type is distributed so you can schedule a pod on every node.
also ceph is the ideal system for carrying a distributed software defined storage on commodity hardware, what i can recommend is https://rook.io/ basically an opensource ceph on 'container-steroids'
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
This is a good question. How this works depends on your storage system. The StorageClass defined for your Persistent Volume Claim contains information about Volume Binding Mode. It is common to use dynamic provisioning volumes, so that the volume is first allocated when a user/consumer/Pod is scheduled. And typically this volume does not exist on the local Node but remote in the same data center. Kubernetes also has support for Local Persistent Volumes that are physical volumes located on the same Node, but they are typically more expensive and used when you need high disk performance and volume.
Currently I have bitnami discourse in docker and the data storing in pod, we scale up the pods from 1 to many. Now I am facing error with media uploads, the issue is the pods data are not in sync so I have to mount single volume and shared it between pods. But I want to do that persistent volume claim in kubernetes with the help of azure-storage-class not in docker volume.
I presume you are using your own kubernetes manifest as bitnami aren't supporting discourse chart currently. Mainly, what I understand that you need is a volume that could be accessed by many PODs.
I think you would need read-write-many volume, this is not usually supported by cloud providers but let's give a try in azure.
I hope it helps.
I have a 3 node k8s cluster and having a remote storage box with additional disks connected to it. I want to utilize these disks. So is this use case supported on OpenEBS? Also, do I have to attach the disks to Node before deploying OpenEBS? Is this a prerequisites?
Sure. It's supported and you need the disk attached when you setup OpenEBS as your block storage.
After you set it up, essentially you can create volumes (pvcs, pvs) for Kubernetes and mount them on your pods for consumption.
You can setup OpenEBS on Kubernetes cluster where you run your workloads either using helm or kubectl
Yes OpenEBS support storage with additional disks connected. With 0.7 it has a feature NDM (Node Disk Manager) which would monitor the disks attached to the nodes. Once the disks are attached you can create a pool on top of it and use the same. For more details, document link
We need to made volume to be managed easily. We ned to use PV volume, but we want to be able to start volume on any node, and data not stored on node (if node
crash no problems in this way) so we think about flocker with Ceph backend. What's the best solution for production ?
Flocker is not required. The functionality you are seeking is what Kubernetes Volume Plugins provide.
The way to think about a Kubernetes Persistent Volume (PV) is that it is a configuration object that stores information about a specific network storage asset. When a user submits a claim, assuming it finds a match, it will bind to one of the Persistent Volumes in the pool of available Persistent Volumes. This means your claim is bound to an object that contains information about a specific network storage asset.
When a claim is specified in a Pod or RC, the runtime is able to ascertain the PV bound to the claim and then ascertain which Kubernetes Volume Plugin to use and what parameters to pass itm based on the properties of the PV.
As such, wherever your Pods run in the cluster, they will be able to perform a network mount of the storage asset described in the PV. None of this data will be local. The pod can die and be restarted on any node in the cluster and it will reconnect to the same network storage asset specified in the PV.
Any Kubernetes Volume Plugin, with the exception of EmptyDir and HostPath, can be specified in a Persistent Volume Definition. So you could create a PV that uses the Ceph RBD volume plugin and you would have the functionality that you seek.
Typically Ceph will be run in its own cluster. There are some examples of ceph running in a container (with more work to be done), and in that case, Ceph and your app could share kubernetes nodes.