Google Kubernetes storage in EC2 - deployment

I started to use Docker and I'm trying out Google's Kubernetes project for my container orchestration. It looks really good!
The only thing I'm curious of is how I would handle the volume storage.
I'm using EC2 instances and the containers do volume from the EC2 filesystem.
The only thing left is the way I have to deploy my application code into all those EC2 instances, right? How can I handle this?

It's somewhat unclear what you're asking, but a good place to start would be reading about your options for volumes in Kubernetes.
The options include using local EC2 disk with a lifetime tied to the lifetime of your pod (emptyDir), local EC2 disk with lifetime tied to the lifetime of the node VM (hostDir), and an Elastic Block Store volume (awsElasticBlockStore).

The Kubernetes Container Storage Interface (CSI) project is reaching maturity and includes a volume driver for AWS EBS that allows you to attach EBS volumes to your containers.
The setup is relatively advanced, but does work smoothly once implemented. The advantage of using EBS rather than local storage is that the EBS storage is persistent and independent of the lifetime of the EC2 instance.
In addition, the CSI plugin takes care of the disk creation -> mounting -> unmounting -> deletion lifecycle for you.
The EBS CSI driver has a simple example that could get you started quickly

Related

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

Are Pods forced to run on nodes where their persistent volumes exist?

I'm teaching myself Kubernetes with a 5 Rpi cluster, and I'm a bit confused by the way Kubernetes treats Persistent Volumes with respect to Pod Scheduling.
I have 4 worker nodes using ext4 formatted 64GB micro SD cards. It's not going to give GCP or AWS a run for their money, but it's a side project.
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
Should I be looking into distributed file systems like Ceph or Hdfs so that Pods aren't restricted to being scheduled on a particular node?
Sorry if this seems like a stupid question, I'm self taught and still trying to figure this stuff out! (Feel free to improve my tl;dr doc for kubernetes with a pull req)
just some examples, as already mentioned it depends on your storage system, as i see you use the local storage option
Local Storage:
yes the pod needs to be run on the same machine where the pv is located (your case)
ISCSI/Trident San:
no, the node will mount the iscsi block device where the pod will be scheduled
(as mentioned already volume binding mode is an important keyword, its possible you need to set this to 'WaitForFirstConsumer')
NFS/Trident Nas:
no, its nfs, mountable from everywhere if you can access and auth against it
VMWare VMDK's:
no, same as iscsi, the node which gets the pod scheduled mounts the vmdk from the datastore
ceph/rook.io:
no, you get 3 options for storage, file, block an object storage, every type is distributed so you can schedule a pod on every node.
also ceph is the ideal system for carrying a distributed software defined storage on commodity hardware, what i can recommend is https://rook.io/ basically an opensource ceph on 'container-steroids'
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
This is a good question. How this works depends on your storage system. The StorageClass defined for your Persistent Volume Claim contains information about Volume Binding Mode. It is common to use dynamic provisioning volumes, so that the volume is first allocated when a user/consumer/Pod is scheduled. And typically this volume does not exist on the local Node but remote in the same data center. Kubernetes also has support for Local Persistent Volumes that are physical volumes located on the same Node, but they are typically more expensive and used when you need high disk performance and volume.

Move resources/volumes across contexts in Kubernetes clusters

I have a kubernetss cluster which i have started with a context "dev1.k8s.local" and it has a stateful set with EBS -PV(Persistent volumes)
now we are planning to start another context "dev2.k8s.local"
is there a way i can move dev1 context EBS volumes to context "dev2.k8s.local"
i am using K8S 1.10 & KOPS 1.10 Version
A Context is simply a representation of a Kubernetes configuration, typically ~/.kube/config. This file can have multiple configurations in it that are managed manually or with kubectl context.
When you provision a second Kubernetes cluster on AWS using Kops, brand new resources are recreated that have no frame of reference about the other cluster. Your EBS volumes that were created for PVs in your original cluster cannot simply be transferred between clusters using a context entry in your configuration file. That's not how it is designed to work.
Aside from the design problem, there is also a serious technical hurdle involved. EBS volumes are ReadWriteOnce. Meaning that they can only be attached to a single pod at once. The reason this constraint exists is because the EBS connection is block storage that is treated like a physical block device connected to the underlying worker node running your pod. That physical block device does not exist on the worker nodes in your other cluster. So it's impossible to simply move the pointer over.
The best way to accomplish this would be to back up and copy over the disk. How you handle this is up to your team. One way you could do it is by mounting both EBS volumes and copying the data over manually. You could also take a snapshot and restore the data to the other volume.

GCE volume mounts as compared to Kubernetes volume mounts

Kubernetes has pretty extensive volume and volume mounting support (many different volume types, subpaths, mounting single files).
Can the same be achieved with GCE VMs?
Update:
I have some Kubernetes workflow that uses NFS and GCE PD volumes.
Suppose I want to run the same workflow without Kubernetes (by just starting GCE VMs).
What volume-related features will I lose/keep?
Some examples of features:
Having the same volume shared between multiple producer Pods/VMs.
Mounting single files into container/VM (as opposed to mounting directories only).
The PVs and GCE PD volumes used by GKE use Google Persistent Disks and thus are bound by the same limitations. This also means that there isn't much you can do on k8s that you can't do on GCE. The major difference is the resources won't be as fluid.
You can attach a disk to a GCE VM and mount it as a subpath if you want at the OS level or just mount the entire disk normally. You can also use a single disk in readOnlyMany mode which can be shared by multiple VMs in the same zone (same restriction you have in GKE). If you need scalability, you can use a Managed Instance Group that uses a snapshot of your disk so that replication won't skew the data.
You can also mount NFS in GCE as in GKE.
Migrating from GKE to GCE generally does not have too many restrictions. The major difference is that you are moving from a managed orchestration system to an unmanaged VM so you may need to do some more leg work to make sure that there is scalability (if need be) and resiliency.
Aside from the benefits that k8s offers all around, I can't think of any major benefits you lose concerning the volumes specifically.

Mount back an EBS snapshot in Auto Scaling

I am using Auto Scaling with a Load Balancer and have attached 2 EBS volumes.
Now whenever an instance is terminated it stores the snapshot of the EBS volumes.
I have gone through several links but cannot find how to retrieve/mount the EBS volume when a Launch Configuration launches a new instance.
Can I can get any reference or PowerShell script to identify a volume via tag name from the volume list and mount it when the instance is initiating?
There is no automatic facility to mount an existing EBS snapshot or volume when Auto Scaling launches an instance.
Best practice for Auto Scaling is to store data off-instance, such as in Amazon S3 or Amazon EFS. This way, the data is accessible to all instances simultaneously and can be used by new instances that are launched.
There is also no automatic facility to create an EBS snapshot when an Auto Scaling instance is terminated. Rather, there is the option to Delete on Termination, which controls whether the EBS volume should be deleted when the instance is terminated. If this option is off, then the EBS volumes will remain after an instance is terminated. You could write some code (eg in a User Data script) that re-attached an EBS volume to a new instance launched by Auto Scaling but this can get messy. (For example: Which instance to attach? What happens if more instances are launched?)
Bottom line: Yes, you could write a script to do this, but it is a poor architectural design.
Yes, you can attach (mount) an EBS volume to an EC2 instance using the AWS CLI command line tool. You run this command in the EC2 User Data at instance launch.
Running Commands on Your Linux Instance at Launch
AWS CLI attach-volume
Note: There is a problem with this strategy. The ASG Launch Configuration is for creating new EC2 instances that are identical. This would mean that you would be attempting to attach the same EBS volume to each instance which will fail. You may want to consider using EFS instead.
Amazon Elastic File System
Mount EFS on EC2 using the AWS CLI
Note: Use IAM roles to provide your instances with credentials instead of storing credentials on the EC2 instance.
Once you have configured your "master" EC2 instance create a new AMI for your ASG launch configuration.
When mounted on Amazon EC2 instances, an Amazon EFS file system provides a standard file system interface and file system access semantics, allowing you to seamlessly integrate Amazon EFS with your existing applications and tools. Multiple Amazon EC2 instances can access an Amazon EFS file system at the same time, allowing Amazon EFS to provide a common data source for workloads and applications running on more than one Amazon EC2 instance.