Kubernetes - EBS for storage - kubernetes

I have a question regarding what is the best approach with K8S in AWS.
the way I see it that either I use the EBS directly for PV AND PVC or that I mount the EBS as a regular folder in my EC2 and then use those mounted folders for my PV and PVC.
what approach is better in your opinion?
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
many thanks

what approach is better in your opinion?
Without question: using the PV and PVC. Half the reason will go here, and the other half below. By declaring those as managed resources, kubernetes will cheerfully take care of attaching the volumes to the Node it is scheduling the Pod upon, and detaching it from the Node when the Pod is unscheduled. That will matter in a huge way if a Node reboots, for example, because the attach-detach cycle will happen transparently, no Pager Duty involved. That will not be true if you are having to coordinate amongst your own instances who is alive and should have the volume attached at this moment.
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
It still will be cloud agnostic, because what you have told kubernetes -- declaratively, I'll point out, using just text in a yaml file -- is that you wish for some persistent storage to be volume mounted into your container(s) before they are launched. Only drilling down into the nitty gritty will surface the fact that it's provided by an AWS EBS volume. I would almost guarantee you could move those descriptors over to GKE (or Azure's thing) with about 90% of the text exactly the same.

Related

Problems with using volumes in kubernetes production environments

I'm a beginner in kubernetes, and when I was reading the book, I found that it is not recommended to use hostpath as the volume type for production environment, because it will lead to binding between pod and node, but if you don't use hostpath, then if you use other volume types, when reading and writing files, will it lead to extra network IO, and will this performance suffer? Will this have an additional performance impact?
hostpath is, as the name suggests, reading and writing from a place on the host where the pod is running. If the host goes down, or the pod gets evicted or otherwise removed from the node, that data is (normally) lost. This is why the "binding" is mentioned -- the pod must stay on that same node otherwise it will lose that data.
Using a volume type and having volumes provisioned is better as the disk and the pod can be reattached together on another node and you will not lose the data.
In terms of I/O, there would indeed be a miniscule difference, since you're no longer talking to the node's local disk but a mounted disk.
hostPath volumes are generally used for temporary files or storage that can be lost without impact to the pod, in much the same way you would use /tmp on a desktop machine/
To get a local volume you can use the volume type Local volume, but you need a local volume provisioner that can allocate and recycle volumes for you.
Since local volumes are disks on the host, there are no performance trade-offs. But it is more common to use network located volumes provided by a cloud provider, and they do have a latency trade-off.

Why would anyone want to use GCE Persistent Disk or EBS with Kubernetes?

These disks are accessible by only a single node.
Each node would have different data.
Also, any node can be terminated at any time, so you would have to find a way to reattach the volume to a new node that replaces the old one. How would you do that?
And after scaleup, a new node might not have one of these disks available to attach to, so you would need a new disk.
And why might anyone want to do all this? Just for temporary space? For that, they could use an EC2 instance store or GCE boot disk (though I guess that that might be enough.)
I'm specifically familiar with EBS; I assume GCE persistent disks work the same way. The important detail is that an EBS volume is not tied to a specific node; while it can only be attached to one node at a time, it can be moved to another node, and Kubernetes knows how to do this.
An EBS volume can be dynamically attached to an EC2 instance. In Kubernetes, generally there is a dynamic volume provisioner that's able to create PersistentVolume objects that are backed by EBS volumes in response to PersistentVolumeClaim objects. Critically, if a Pod uses a PVC that references an EBS-volume PV, the storage driver knows that, wherever the Pod is scheduled, it can dynamically attach the EBS volume to that EC2 instance.
That means that an EBS-volume PersistentVolume isn't actually "locked" to a single node. If the Pod is deleted and a new one uses the PersistentVolumeClaim, the volume can "move" to the node that runs the new Pod. If the Node is removed, all of its Pods can be rescheduled elsewhere, and the EBS volumes can go somewhere else too.
An EBS volume can only be attached to one instance at a time; in Kubernetes volume terminology, it can only have a ReadWriteOnce access mode. If it could be attached to many instances (as, for instance, an EFS NFS-based filesystem could be) it could be ReadOnlyMany or ReadWriteMany.
This makes EBS be a reasonably good default choice for persistent data storage, if your application actually needs it. It's not actually host-specific and it can move around the cluster as needed. It won't work if two Pods need to share files, but this is generally a complex and fragile setup and it's better to design your application to not need it.
The best setup is if your application doesn't need persistent local storage at all. This makes it easy to scale Deployments, because the data is "somewhere else". The data could be in a database; the data could be in a managed database, such as RDS; or it could be in an object-storage system like S3. Again, this requires changes in your application to not use local files for data storage.

Are Pods forced to run on nodes where their persistent volumes exist?

I'm teaching myself Kubernetes with a 5 Rpi cluster, and I'm a bit confused by the way Kubernetes treats Persistent Volumes with respect to Pod Scheduling.
I have 4 worker nodes using ext4 formatted 64GB micro SD cards. It's not going to give GCP or AWS a run for their money, but it's a side project.
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
Should I be looking into distributed file systems like Ceph or Hdfs so that Pods aren't restricted to being scheduled on a particular node?
Sorry if this seems like a stupid question, I'm self taught and still trying to figure this stuff out! (Feel free to improve my tl;dr doc for kubernetes with a pull req)
just some examples, as already mentioned it depends on your storage system, as i see you use the local storage option
Local Storage:
yes the pod needs to be run on the same machine where the pv is located (your case)
ISCSI/Trident San:
no, the node will mount the iscsi block device where the pod will be scheduled
(as mentioned already volume binding mode is an important keyword, its possible you need to set this to 'WaitForFirstConsumer')
NFS/Trident Nas:
no, its nfs, mountable from everywhere if you can access and auth against it
VMWare VMDK's:
no, same as iscsi, the node which gets the pod scheduled mounts the vmdk from the datastore
ceph/rook.io:
no, you get 3 options for storage, file, block an object storage, every type is distributed so you can schedule a pod on every node.
also ceph is the ideal system for carrying a distributed software defined storage on commodity hardware, what i can recommend is https://rook.io/ basically an opensource ceph on 'container-steroids'
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
This is a good question. How this works depends on your storage system. The StorageClass defined for your Persistent Volume Claim contains information about Volume Binding Mode. It is common to use dynamic provisioning volumes, so that the volume is first allocated when a user/consumer/Pod is scheduled. And typically this volume does not exist on the local Node but remote in the same data center. Kubernetes also has support for Local Persistent Volumes that are physical volumes located on the same Node, but they are typically more expensive and used when you need high disk performance and volume.

Move resources/volumes across contexts in Kubernetes clusters

I have a kubernetss cluster which i have started with a context "dev1.k8s.local" and it has a stateful set with EBS -PV(Persistent volumes)
now we are planning to start another context "dev2.k8s.local"
is there a way i can move dev1 context EBS volumes to context "dev2.k8s.local"
i am using K8S 1.10 & KOPS 1.10 Version
A Context is simply a representation of a Kubernetes configuration, typically ~/.kube/config. This file can have multiple configurations in it that are managed manually or with kubectl context.
When you provision a second Kubernetes cluster on AWS using Kops, brand new resources are recreated that have no frame of reference about the other cluster. Your EBS volumes that were created for PVs in your original cluster cannot simply be transferred between clusters using a context entry in your configuration file. That's not how it is designed to work.
Aside from the design problem, there is also a serious technical hurdle involved. EBS volumes are ReadWriteOnce. Meaning that they can only be attached to a single pod at once. The reason this constraint exists is because the EBS connection is block storage that is treated like a physical block device connected to the underlying worker node running your pod. That physical block device does not exist on the worker nodes in your other cluster. So it's impossible to simply move the pointer over.
The best way to accomplish this would be to back up and copy over the disk. How you handle this is up to your team. One way you could do it is by mounting both EBS volumes and copying the data over manually. You could also take a snapshot and restore the data to the other volume.

Kubernetes shared persistent volume between multiple nodes

I am currently looking into setting up Kubernetes pods for a project on GCP.
The problem - I need to set a persistent shared volume which will be used by multiple nodes. I need all nodes to be able to read from the volume and only one node must be able to write on the volume. So I need some advice what's the best way to achieve that?
I have checked the Kubernetes documentation and know that GCEPersistentDisks does not support ReadWriteMany but anyway this access mode I think will be an overkill. Regarding the ReadOnlyMany I get that nodes can read from the PV but I don't understand how or what can actually modify the PV in this case. Currently my best bet is setting up NFS with GCE persistent disk.
Also the solution should be able to run on the cloud or on premise. Any advice will be appreciated :)
According to official documentation:
A PVC to PV binding is a one-to-one mapping.
A volume can only be mounted using one access mode at a time, even if
it supports many. For example, a GCEPersistentDisk can be mounted as
ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not
at the same time.
So I am afraid that this would be impossible to do it how you described.
However, you may want to try task queue
Please let me know if that helped.
Assuming there is some NFS space availalbe, it could be possible to create two persistent volume claims (PVCs) for it: one readonly, one readwrite.
Then you could have two persistent volumes binding one to one to each of the PVCs.
Now create two deployments. One of them describes the pod with the writing application, and you ensure you have one replica. The other deployment describes the reading application, and you can scale it to whatever amount you like.