Kubernetes - Persistent storage for PostgreSQL - postgresql

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?

one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods

Related

How to have data in a database with FastAPI persist across multiple nodes?

If I use the https://github.com/tiangolo/full-stack-fastapi-postgresql project generator, how would one be able to persist data across multiple nodes (either with docker swarm or kubernetes)?
As I understand it, any postgresql data in a volumes directory would be different for every node (e.g. every digitalocean droplet). In this case, a user may ask for their data, get directed by traefik to a node with a different volumes directory, and return different information to the case where they may have been directed to another node. Is this correct?
If so, what would be the best approach to have multiple servers running a database work together and have the same data in the database?
On kubernetes, persistent volumes are used to associate storage that is mounted onto pods wherever they are loaded in the cluster and they are managed by providing the cluster with storage classes that map to drivers that map to some kind of SAN storage.
Docker / Docker swarm has similar support for docker volume plugins, but with the ascendancy of K8s there are virtually no active open source projects, and most of the prior commercial SAN driver vendors have migrated to K8s instead.
Nonetheless, depending on your tolerance, you can use a mix of direct nfs / fuse mounts, there are some not entirely abandoned docker volume drivers available in the nfs / glusterfs space.
This issue moby/moby #39624 addresses CSI support that we will hopefully see drop in 2021 that will bring swarm back inline with k8s.

HA postgresql on kubernetes

I wanted to deploy postgresql as database in my kubernetes cluster. As of now I've followed this tutorial.
By reading the whole thing I understood that we claimed a static storage before initiating the postgresql so that we have the data in case the pod fails. Also we can do replication by pointing to the same storage space to get our data back.
What happens if we use two workers nodes and the pods containing the database migrate to another node? I don’t think local storage will work.
hostPath volume is not recommended for production usage because of its ephemeral nature which means if the pod is rescheduled to another node the storage is not migrated and if the node reboots the data is lost.
For durable storage use external block or file storage systems mounted on the nodes using a supported CSI driver
For HA postgres I suggest you explore Postgres Operator which delivers an easy to run highly-available PostgreSQL clusters on Kubernetes (K8s) powered by Patroni. It is configured only through Postgres manifests (CRDs) to ease integration into automated CI/CD pipelines with no access to Kubernetes API directly, promoting infrastructure as code vs manual operations

Move resources/volumes across contexts in Kubernetes clusters

I have a kubernetss cluster which i have started with a context "dev1.k8s.local" and it has a stateful set with EBS -PV(Persistent volumes)
now we are planning to start another context "dev2.k8s.local"
is there a way i can move dev1 context EBS volumes to context "dev2.k8s.local"
i am using K8S 1.10 & KOPS 1.10 Version
A Context is simply a representation of a Kubernetes configuration, typically ~/.kube/config. This file can have multiple configurations in it that are managed manually or with kubectl context.
When you provision a second Kubernetes cluster on AWS using Kops, brand new resources are recreated that have no frame of reference about the other cluster. Your EBS volumes that were created for PVs in your original cluster cannot simply be transferred between clusters using a context entry in your configuration file. That's not how it is designed to work.
Aside from the design problem, there is also a serious technical hurdle involved. EBS volumes are ReadWriteOnce. Meaning that they can only be attached to a single pod at once. The reason this constraint exists is because the EBS connection is block storage that is treated like a physical block device connected to the underlying worker node running your pod. That physical block device does not exist on the worker nodes in your other cluster. So it's impossible to simply move the pointer over.
The best way to accomplish this would be to back up and copy over the disk. How you handle this is up to your team. One way you could do it is by mounting both EBS volumes and copying the data over manually. You could also take a snapshot and restore the data to the other volume.

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

kubernetes statefulsets : does node sees same persisted volume after it restarts on same node

I am reading local storage design of kubernetes. It has a section for distributed database where db replicates data by itself.
My question is that if any process of db goes down, will it be restarted on the same node/machine ? I think it's a yes.
If it's yes, will it have access to its local storage that it had before it crashed ?
I read an older article about stateful sets when it was in beta. The article didn't encourage use of local storage at that time.
I am new to Kubernetes, so please answer this question with more information that a some new needs for understanding.
In the local storage design, as you can read here, it is used with stateful sets. So for example if you want three mongodb instances named mongodb, then k8s will create for you three pods:
mongodb-1
mongodb-2
mongodb-3
If the mongodb-2 fails then k8s will restart it with the same local storage or persistent volume.
If you increase the number of replicas, then k8s will create new persistent volumes through your persistentVolumeClaimTemplate. If you shrink it down to two those newly created volumes won't be deleted and will be used it you go back to your previous number of replicas.
If your persistent volume is bound to a specific node then k8s will know to create your pod on that node.
You can read about a mongodb cluster statefulset example here : https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets
Or you can check a great talk here (with demos):
https://www.youtube.com/watch?v=m8anzXcP-J8&feature=youtu.be
The use of statefuls sets and local storage is really well explained.