Is there an intermediate layer/cache between Kubernetes pod and Persistance volume, or does a pod access PV directly - kubernetes

Recently I ran into a strange problem. We have two pods running into an openshift cluster that shares a persistent volume (GlusterFs) between them.
Now for the sake of this explanation, let's assume one of the pods was PodA and the Other was PodB, in this case, PodB was running for three months, there is automation in POdA which creates/updates files in the shared persistence volume and PodB reads it and perform some operation based on the input.
Now coming to the problem, whenever POdA created a new file in the shared PV it was visible and accessible from PodA. However, there were a few files that PodA was updating periodically, but the change was not reflected in PodB. So in PodB, we could only see the old version of those files. To solve that problem, we have forcefully deleted PodB, and then openshift recreated it, and the problem was gone.
I thought in PV mechanism Kubernetes mount external storage/folder into the pod (container), and there is no intermediate layer or cache or something like that. From what we have experienced so far, it seems every container (or pod) creates a local copy of those files, or maybe there is a cache in between (PV and pod),
I have searched about this on google and could not find a detailed explanation on how this PV mount works in Kubernetes , would love to know the actual reason behind this problem.

There is no caching mechanism for PVs provided by Kubernetes, so the problem you are observing must be located in either the GlusterFS CSI driver or GlusterFS itself.

Related

StatefulSet and Local Persistent Volume when the kube node is gone

This question is about StatefulSet and Local Persistent Volume.
If we deploy a StatefulSet with the pods using local persistent volumes, when the Kube node hosting a persistent volume is gone, the corresponding pod become un-schedulable. My question is how can a operator reliably detect this problem?
I can’t find any documentation taking about this. Can operator receives a notification or something?
What I observed is when a node hosting a PV is deleted, the corresponding pod stuck in pending stage. One way I can think of to detect this problem is to find the PVC for the pod, then find the PV for the PVC, and then find the node the PV is on, then query to see if the node is there.
But the problem is, inspecting PV and node requires cluster level privilege, which ideally should not be given to an operator that is only supposed to manage namespace level resources.
Plus, I am not sure that (following Pod->PVC->PV->Node sequence) captures all possible situations that physical storage becomes inaccessible.
What is the proper way to detect the situation? Once the situation is detected, it is pretty easy to fix.
Thank you very much!

New PVC for an active pod

Is it possible to plug and play storage to an active pod without restarting the pod? I want to bind a new storage to a running pod without restarting the pod. Does Kubernetes support this?
Most things in a Pod are immutable. In particular if you look at the API definition of a PodSpec it says in part (emphasis mine)
container: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
Typically you don't directly work with Pods; you work with a higher-level controller like a Deployment. There you can edit these things, and it reacts by creating new Pods with the new pod spec and then deleting the old Pods.
Also remember that sometimes the cluster itself will delete or restart a Pod (if its Node is over capacity or fails, for example) and you don't have any control over this. It's better to plan for your Pods to be periodically restarted than to try to prevent it.

Kubernetes Persistent Volume for Pod in Production

I executed a scenario where I deployed Microsoft SQL Database on my K8s cluster using PV and PVC. It work well but I see some strange behaviour. I created PV but it is only visible on one node and not on other workers nodes. What am I missing here, any inputs please?
Background:
Server 1 - Master
Server 2 - Worker
Server 3 - Worker
Server 4 - Worker
Pod : "MyDb" is running on Server (Node) 4 without any replica set. I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
Please let me know your thought on this issue or share your inputs about mounting shared disk in production cluster.
Those who want to deploy SQL DB on K8s cluster, can refer blog posted by Philips. Link below,
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-1/ (without PV)
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-2/ (with PV and Claim)
Regards,
Farooq
Please see below my findings of my original problem statement.
Problem: POD for SQL Server was created. At runtime K8s created this pod on server-4 hence created PV on server-4. However, on other node PV path wasn't created (/tmp/sqldata_.
I shutdown server-4 node and run command for deleting SQL pod (no replica was used so initially).
Status of POD changed to "Terminating" POD
Nothing happened for a while.
I restarted server-4 and noticed POD got deleted immediately.
Next Step:
- I stopped server-4 again and created same pod.
- POD was created on server-3 node at runtime and I see PV (/tmp/sqldata) was created as well on server-3. However, all my data (example samples tables) are was lost. It is fresh new PV on server 3 now.
I am assuming PV would be mounted volume of external storage and not storage/disk from any node in cluster.
I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
This is more or less correct and you should be able to verify this by simply deleting the Pod and recreating it (since you say you do not have a ReplicaSet doing that for you). The PersistentVolume will then be visible on the node where the Pod is scheduled to.
Edit: The above assumes that you are using an external storage provider such as NFS or AWS EBS (see possible storage providers for Kubernetes). With HostPath the above does NOT apply and a PV will be created locally on a node (and will not be mounted to another node).
There is no reason to mount the PersistentVolume also to the other nodes. Imagine having hundreds of nodes, would you want to mount your PersistentVolume to all of them, while your Pod is just running on one?
You are also asking about "shared" disks. The PersistentVolume created in the blog post you linked is using ReadWriteMany, so you actually can start multiple Pods accessing the same volume (given that your storage supports that as well). But your software (a database in your case) needs to support having multiple processes accessing the same data.
Especially when considering databases, you should also look into StatefulSets, as this basically allows you to define Pods that are always using the same storage, which can be very interesting for databases. Wherever you should run or not run databases on Kubernetes is a whole different topic...

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

access the same file with several mongo db

Problem
I have a project to do, it consists on the following: I need to have a containerized Database. When the load to the database goes up and the database gets overloaded, I need to pull up another container(pod) with the database. The problem is that the new pod needs to have some data preloaded (for reading purposes to the users) and when the load goes down, the new pod will get terminated and the data stored in there needs to be stored in a central database (to avoid loosing it).
I'm using a Kubernetes cluster (Google Kubernetes Engine) for the pods and Mongo db. It kind of looks like this
DB Diagram
I know that the problem described above is probably not the recommended approach, but that's what they are asking us to do.
Now, the problem is that MongoDB does not allow to do that (merge content from several databases into one database). A script that is controlling the pods (that need to be handled dinamically) and pulling the data from them and pushing it to the central database is complicated and things like having control of the data that was already pulled need to be taking care of.
My Idea of Solution
So, my idea was that all the containers point to the same volume. That means that the files in the directory '/data/db' (where mongo stores it files) of every pod are the same for all the pods because the same volume is mounted for all the pods. It kind of looks like this
Same Volume Mounted for each Pod
Kubernetes allows you to use several volumes. The ones that allow ReadWriteMany are NFS and Cephfs among others. I tried the example of this NFS link but it did not worked. The first pod started successfully but the others got stucked in "Starting Container". I assume that the volume could not be mounted for the other pods because the WriteMany was not allowed.
I tried creating a Cephfs cluster manually, but I am having some trouble to use the cluster from the Kubernetes Cluster. My Question is: will cephfs do the job? Can cephfs handle several writers in the same file? If it can, will the Mongo pods go up successfully? NFS was supposed to work, but it didn't.