k8s - what happens to persistent storage when cluster is deleted/? - kubernetes

Do they get permanently deleted as well?
I imagine they do, since they are a part of a cluster, but I'm new to k8s and I can't find this info online.
If they do get deleted, what would be the preferred solution to keep the data for a cluster that sometimes gets completely deleted and re-deployed?
Thanks

According to documentation you can avoid complete PersistentVolume deletion by using retain reclaiming policies.
In this case even after PersistentVolume deletion it still exist in external infrastructure, like AWS EBS. So it is possible to recover or reuse existed data.
You can find more details here and here

Related

Is there an intermediate layer/cache between Kubernetes pod and Persistance volume, or does a pod access PV directly

Recently I ran into a strange problem. We have two pods running into an openshift cluster that shares a persistent volume (GlusterFs) between them.
Now for the sake of this explanation, let's assume one of the pods was PodA and the Other was PodB, in this case, PodB was running for three months, there is automation in POdA which creates/updates files in the shared persistence volume and PodB reads it and perform some operation based on the input.
Now coming to the problem, whenever POdA created a new file in the shared PV it was visible and accessible from PodA. However, there were a few files that PodA was updating periodically, but the change was not reflected in PodB. So in PodB, we could only see the old version of those files. To solve that problem, we have forcefully deleted PodB, and then openshift recreated it, and the problem was gone.
I thought in PV mechanism Kubernetes mount external storage/folder into the pod (container), and there is no intermediate layer or cache or something like that. From what we have experienced so far, it seems every container (or pod) creates a local copy of those files, or maybe there is a cache in between (PV and pod),
I have searched about this on google and could not find a detailed explanation on how this PV mount works in Kubernetes , would love to know the actual reason behind this problem.
There is no caching mechanism for PVs provided by Kubernetes, so the problem you are observing must be located in either the GlusterFS CSI driver or GlusterFS itself.

Kubernetes cluster Mysql Nodes Storage

We have started setting up a Kubernetes cluster. On Production, we have 4 Mysql Nodes(2 Active Master, 2 Active slaves). Complete servers are on-premise, There is NO cloud providers usage.
Now how do I configure storage? I mean should I use PV / PVC? How will it work. Should I use local PV? Can someone explain to me this?
You need to use PersistentVolumes and PersistentVolumeClaims in order to achieve that.
A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
Storage Classes.
A PersistentVolumeClaim (PVC) is a request for storage by a user.
Claims can request specific size and access modes (e.g., they can be
mounted once read/write or many times read-only).
Containers are ephemeral. When the container is restarted all the changes made prior to it are lost. Databases, however expect the data is persistent, therefore you need persistent volumes. You have to create a storage claim and the pod must be configured to mount the claimed storage.
Here you will find a simple guide showing how to deploy MySQL with a PersistentVolume. However, I strongly recommend getting familiar with the official docs that I have linked in order to fully understand the concept and adjust the access mode, class, size, etc according to your needs.
Please let me know if that helped.

Why are some Kubernetes' resources immutable after creation?

From the Kubernetes' validation source code, at least those resources are immutable after creation:
Persistent Volumes
Storage Classes
Why is that ?
This is a core concept on Kubernetes. A few specs are immutable because their change has impact in the basic structure of the resource it's connected.
For example, changing the Persistent Volumes may impact pods that are using this PV. Let's suppose you have a mysql pod running on a PV and you change it in a way that all the data is gone.
On Kubernetes 1.18 Secrets and ConfigMaps also became immutable as an Alpha feature, meaning that this will be the new default soon. Check the GitHub Issue here.
What is it good for?
The most popular and the most convenient way of consuming Secrets and
ConfigMaps by Pods is consuming it as a file. However, any update to a
Secret or ConfigMap object is quickly (roughly within a minute)
reflected in updates of the file mounted for all Pods consuming them.
That means that a bad update (push) of Secret and/or ConfigMap can
very quickly break the entire application.
Here you can read more about the motivation behind this decision.
In this KEP, we are proposing to introduce an ability to specify that
contents of a particular Secret/ConfigMap should be immutable for its
whole lifetime. For those Secrets/ConfigMap, Kubelets will not be
trying to watch/poll for changes to updated mounts for their Pods.
Given there are a lot of users not really taking advantage of
automatic updates of Secrets/ConfigMaps due to consequences described
above, this will allow them to:
protect themselves better for accidental bad updates that could cause outages of their applications
achieve better performance of their cluster thanks to significant reduction of load on apiserver

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

access the same file with several mongo db

Problem
I have a project to do, it consists on the following: I need to have a containerized Database. When the load to the database goes up and the database gets overloaded, I need to pull up another container(pod) with the database. The problem is that the new pod needs to have some data preloaded (for reading purposes to the users) and when the load goes down, the new pod will get terminated and the data stored in there needs to be stored in a central database (to avoid loosing it).
I'm using a Kubernetes cluster (Google Kubernetes Engine) for the pods and Mongo db. It kind of looks like this
DB Diagram
I know that the problem described above is probably not the recommended approach, but that's what they are asking us to do.
Now, the problem is that MongoDB does not allow to do that (merge content from several databases into one database). A script that is controlling the pods (that need to be handled dinamically) and pulling the data from them and pushing it to the central database is complicated and things like having control of the data that was already pulled need to be taking care of.
My Idea of Solution
So, my idea was that all the containers point to the same volume. That means that the files in the directory '/data/db' (where mongo stores it files) of every pod are the same for all the pods because the same volume is mounted for all the pods. It kind of looks like this
Same Volume Mounted for each Pod
Kubernetes allows you to use several volumes. The ones that allow ReadWriteMany are NFS and Cephfs among others. I tried the example of this NFS link but it did not worked. The first pod started successfully but the others got stucked in "Starting Container". I assume that the volume could not be mounted for the other pods because the WriteMany was not allowed.
I tried creating a Cephfs cluster manually, but I am having some trouble to use the cluster from the Kubernetes Cluster. My Question is: will cephfs do the job? Can cephfs handle several writers in the same file? If it can, will the Mongo pods go up successfully? NFS was supposed to work, but it didn't.