Move resources/volumes across contexts in Kubernetes clusters - kubernetes

I have a kubernetss cluster which i have started with a context "dev1.k8s.local" and it has a stateful set with EBS -PV(Persistent volumes)
now we are planning to start another context "dev2.k8s.local"
is there a way i can move dev1 context EBS volumes to context "dev2.k8s.local"
i am using K8S 1.10 & KOPS 1.10 Version

A Context is simply a representation of a Kubernetes configuration, typically ~/.kube/config. This file can have multiple configurations in it that are managed manually or with kubectl context.
When you provision a second Kubernetes cluster on AWS using Kops, brand new resources are recreated that have no frame of reference about the other cluster. Your EBS volumes that were created for PVs in your original cluster cannot simply be transferred between clusters using a context entry in your configuration file. That's not how it is designed to work.
Aside from the design problem, there is also a serious technical hurdle involved. EBS volumes are ReadWriteOnce. Meaning that they can only be attached to a single pod at once. The reason this constraint exists is because the EBS connection is block storage that is treated like a physical block device connected to the underlying worker node running your pod. That physical block device does not exist on the worker nodes in your other cluster. So it's impossible to simply move the pointer over.
The best way to accomplish this would be to back up and copy over the disk. How you handle this is up to your team. One way you could do it is by mounting both EBS volumes and copying the data over manually. You could also take a snapshot and restore the data to the other volume.

Related

Is Kubernetes local/csi PV content synced into a new node?

According to the documentation:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned ... It is a resource in the cluster just like a node is a cluster resource...
So I was reading about all currently available plugins for PVs and I understand that for 3rd-party / out-of-cluster storage this doesn't matter (e.g. storing data in EBS, Azure or GCE disks) because there are no or very little implications when adding or removing nodes from a cluster. However, there are different ones such as (ignoring hostPath as that works only for single-node clusters):
csi
local
which (at least from what I've read in the docs) don't require 3rd-party vendors/software.
But also:
... local volumes are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume becomes inaccessible by the pod. The pod using this volume is unable to run. Applications using local volumes must be able to tolerate this reduced availability, as well as potential data loss, depending on the durability characteristics of the underlying disk.
The local PersistentVolume requires manual cleanup and deletion by the user if the external static provisioner is not used to manage the volume lifecycle.
Use-case
Let's say I have a single-node cluster with a single local PV and I want to add a new node to the cluster, so I have 2-node cluster (small numbers for simplicity).
Will the data from an already existing local PV be 1:1 replicated into the new node as in having one PV with 2 nodes of redundancy or is it strictly bound to the existing node only?
If the already existing PV can't be adjusted from 1 to 2 nodes, can a new PV (created from scratch) be created so it's 1:1 replicated between 2+ nodes on the cluster?
Alternatively if not, what would be the correct approach without using a 3rd-party out-of-cluster solution? Will using csi cause any change to the overall approach or is it the same with redundancy, just different "engine" under the hood?
Can a new PV be created so it's 1:1 replicated between 2+ nodes on the cluster?
None of the standard volume types are replicated at all. If you can use a volume type that supports ReadWriteMany access (most readily NFS) then multiple pods can use it simultaneously, but you would have to run the matching NFS server.
Of the volume types you reference:
hostPath is a directory on the node the pod happens to be running on. It's not a directory on any specific node, so if the pod gets recreated on a different node, it will refer to the same directory but on the new node, presumably with different content. Aside from basic test scenarios I'm not sure when a hostPath PersistentVolume would be useful.
local is a directory on a specific node, or at least following a node-affinity constraint. Kubernetes knows that not all storage can be mounted on every node, so this automatically constrains the pod to run on the node that has the directory (assuming the node still exists).
csi is an extremely generic extension mechanism, so that you can run storage drivers that aren't on the list you link to. There are some features that might be better supported by the CSI version of a storage backend than the in-tree version. (I'm familiar with AWS: the EBS CSI driver supports snapshots and resizing; the EFS CSI driver can dynamically provision NFS directories.)
In the specific case of a local test cluster (say, using kind) using a local volume will constrain pods to run on the node that has the data, which is more robust than using a hostPath volume. It won't replicate the data, though, so if the node with the data is deleted, the data goes away with it.

Why would anyone want to use GCE Persistent Disk or EBS with Kubernetes?

These disks are accessible by only a single node.
Each node would have different data.
Also, any node can be terminated at any time, so you would have to find a way to reattach the volume to a new node that replaces the old one. How would you do that?
And after scaleup, a new node might not have one of these disks available to attach to, so you would need a new disk.
And why might anyone want to do all this? Just for temporary space? For that, they could use an EC2 instance store or GCE boot disk (though I guess that that might be enough.)
I'm specifically familiar with EBS; I assume GCE persistent disks work the same way. The important detail is that an EBS volume is not tied to a specific node; while it can only be attached to one node at a time, it can be moved to another node, and Kubernetes knows how to do this.
An EBS volume can be dynamically attached to an EC2 instance. In Kubernetes, generally there is a dynamic volume provisioner that's able to create PersistentVolume objects that are backed by EBS volumes in response to PersistentVolumeClaim objects. Critically, if a Pod uses a PVC that references an EBS-volume PV, the storage driver knows that, wherever the Pod is scheduled, it can dynamically attach the EBS volume to that EC2 instance.
That means that an EBS-volume PersistentVolume isn't actually "locked" to a single node. If the Pod is deleted and a new one uses the PersistentVolumeClaim, the volume can "move" to the node that runs the new Pod. If the Node is removed, all of its Pods can be rescheduled elsewhere, and the EBS volumes can go somewhere else too.
An EBS volume can only be attached to one instance at a time; in Kubernetes volume terminology, it can only have a ReadWriteOnce access mode. If it could be attached to many instances (as, for instance, an EFS NFS-based filesystem could be) it could be ReadOnlyMany or ReadWriteMany.
This makes EBS be a reasonably good default choice for persistent data storage, if your application actually needs it. It's not actually host-specific and it can move around the cluster as needed. It won't work if two Pods need to share files, but this is generally a complex and fragile setup and it's better to design your application to not need it.
The best setup is if your application doesn't need persistent local storage at all. This makes it easy to scale Deployments, because the data is "somewhere else". The data could be in a database; the data could be in a managed database, such as RDS; or it could be in an object-storage system like S3. Again, this requires changes in your application to not use local files for data storage.

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Kubernetes - EBS for storage

I have a question regarding what is the best approach with K8S in AWS.
the way I see it that either I use the EBS directly for PV AND PVC or that I mount the EBS as a regular folder in my EC2 and then use those mounted folders for my PV and PVC.
what approach is better in your opinion?
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
many thanks
what approach is better in your opinion?
Without question: using the PV and PVC. Half the reason will go here, and the other half below. By declaring those as managed resources, kubernetes will cheerfully take care of attaching the volumes to the Node it is scheduling the Pod upon, and detaching it from the Node when the Pod is unscheduled. That will matter in a huge way if a Node reboots, for example, because the attach-detach cycle will happen transparently, no Pager Duty involved. That will not be true if you are having to coordinate amongst your own instances who is alive and should have the volume attached at this moment.
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
It still will be cloud agnostic, because what you have told kubernetes -- declaratively, I'll point out, using just text in a yaml file -- is that you wish for some persistent storage to be volume mounted into your container(s) before they are launched. Only drilling down into the nitty gritty will surface the fact that it's provided by an AWS EBS volume. I would almost guarantee you could move those descriptors over to GKE (or Azure's thing) with about 90% of the text exactly the same.

Kubernetes - Persistent storage for PostgreSQL

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods