Problem
I have a project to do, it consists on the following: I need to have a containerized Database. When the load to the database goes up and the database gets overloaded, I need to pull up another container(pod) with the database. The problem is that the new pod needs to have some data preloaded (for reading purposes to the users) and when the load goes down, the new pod will get terminated and the data stored in there needs to be stored in a central database (to avoid loosing it).
I'm using a Kubernetes cluster (Google Kubernetes Engine) for the pods and Mongo db. It kind of looks like this
DB Diagram
I know that the problem described above is probably not the recommended approach, but that's what they are asking us to do.
Now, the problem is that MongoDB does not allow to do that (merge content from several databases into one database). A script that is controlling the pods (that need to be handled dinamically) and pulling the data from them and pushing it to the central database is complicated and things like having control of the data that was already pulled need to be taking care of.
My Idea of Solution
So, my idea was that all the containers point to the same volume. That means that the files in the directory '/data/db' (where mongo stores it files) of every pod are the same for all the pods because the same volume is mounted for all the pods. It kind of looks like this
Same Volume Mounted for each Pod
Kubernetes allows you to use several volumes. The ones that allow ReadWriteMany are NFS and Cephfs among others. I tried the example of this NFS link but it did not worked. The first pod started successfully but the others got stucked in "Starting Container". I assume that the volume could not be mounted for the other pods because the WriteMany was not allowed.
I tried creating a Cephfs cluster manually, but I am having some trouble to use the cluster from the Kubernetes Cluster. My Question is: will cephfs do the job? Can cephfs handle several writers in the same file? If it can, will the Mongo pods go up successfully? NFS was supposed to work, but it didn't.
Related
We are running a cluster of x nodes.
Every node in the cluster pulls some files from remote storage. Unfortunately, the remote server is getting overloaded. So we are exploring a solution in which only a subset of the nodes pulls the files and are served to the remaining nodes (read-only - the other nodes do not need to write). Some subset of nodes can undergo maintenance often and can be taken offline.
I was experimenting with running NFS as a pod in a replica set with a service (fixed IP) for each of the NFS pods. If one node with the NFS-pod goes down, k8 will take care of bringing up an NFS-pod in another node with the same sticky IP.
But this new NFS would still need to remounted on the other nodes.
Any better solution for this storage problem?
Note that we would ideally not like to use remote storage since this adds extra latency.
Try Expanding Persistent Volume Claims. It's overhead for you to maintain, I recommend you to go with some locally managed the same. After that your choice.
There 2 options also recommended like : hostPath & GlusterFS volume, Please refer to this SO for more information.
#scenox suggested that's also a good option.
Recently I ran into a strange problem. We have two pods running into an openshift cluster that shares a persistent volume (GlusterFs) between them.
Now for the sake of this explanation, let's assume one of the pods was PodA and the Other was PodB, in this case, PodB was running for three months, there is automation in POdA which creates/updates files in the shared persistence volume and PodB reads it and perform some operation based on the input.
Now coming to the problem, whenever POdA created a new file in the shared PV it was visible and accessible from PodA. However, there were a few files that PodA was updating periodically, but the change was not reflected in PodB. So in PodB, we could only see the old version of those files. To solve that problem, we have forcefully deleted PodB, and then openshift recreated it, and the problem was gone.
I thought in PV mechanism Kubernetes mount external storage/folder into the pod (container), and there is no intermediate layer or cache or something like that. From what we have experienced so far, it seems every container (or pod) creates a local copy of those files, or maybe there is a cache in between (PV and pod),
I have searched about this on google and could not find a detailed explanation on how this PV mount works in Kubernetes , would love to know the actual reason behind this problem.
There is no caching mechanism for PVs provided by Kubernetes, so the problem you are observing must be located in either the GlusterFS CSI driver or GlusterFS itself.
If I use the https://github.com/tiangolo/full-stack-fastapi-postgresql project generator, how would one be able to persist data across multiple nodes (either with docker swarm or kubernetes)?
As I understand it, any postgresql data in a volumes directory would be different for every node (e.g. every digitalocean droplet). In this case, a user may ask for their data, get directed by traefik to a node with a different volumes directory, and return different information to the case where they may have been directed to another node. Is this correct?
If so, what would be the best approach to have multiple servers running a database work together and have the same data in the database?
On kubernetes, persistent volumes are used to associate storage that is mounted onto pods wherever they are loaded in the cluster and they are managed by providing the cluster with storage classes that map to drivers that map to some kind of SAN storage.
Docker / Docker swarm has similar support for docker volume plugins, but with the ascendancy of K8s there are virtually no active open source projects, and most of the prior commercial SAN driver vendors have migrated to K8s instead.
Nonetheless, depending on your tolerance, you can use a mix of direct nfs / fuse mounts, there are some not entirely abandoned docker volume drivers available in the nfs / glusterfs space.
This issue moby/moby #39624 addresses CSI support that we will hopefully see drop in 2021 that will bring swarm back inline with k8s.
I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods