setting up kubernetes cluster and running a database - postgresql

This is my proposed kubernetes cluster, I want to be able to run Postgresql database, with my nodes accessing storage machine for storing the data, is this using NFS a good option? How best can I run a database instance here?

I recommend you to use helm chart for deploying any kind of data base, it is very handy and easy to deploy, visit the link:
https://github.com/bitnami/charts/tree/master/bitnami/postgresql
Any way if you want to deploy Postgresql, first of all you need to create persistent volume(pv) and persistent volume claim(pvc), because you choosed NFS for your cluster storage solution. You have to manually create your pv and pvc.
But kubernetes has storage class solution too, it is better to use some kubernetes volume plugin with internal provisioner like glusterfs or cephfs.
https://kubernetes.io/docs/concepts/storage/storage-classes/

Related

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

How to claim & configure Bitnami Docker Discourse data in Kubernetes Persistent volume?

Currently I have bitnami discourse in docker and the data storing in pod, we scale up the pods from 1 to many. Now I am facing error with media uploads, the issue is the pods data are not in sync so I have to mount single volume and shared it between pods. But I want to do that persistent volume claim in kubernetes with the help of azure-storage-class not in docker volume.
I presume you are using your own kubernetes manifest as bitnami aren't supporting discourse chart currently. Mainly, what I understand that you need is a volume that could be accessed by many PODs.
I think you would need read-write-many volume, this is not usually supported by cloud providers but let's give a try in azure.
I hope it helps.

Is it Appropriate to Store Database in a Kubernetes Persistent Volume (And how to back up?)

I have a web application running on a Google Kubernetes cluster. My web app also uses persistent volumes for multiple MongoDB databases to store user and application data.
(1) Thus I am wondering if it is practical to store all data inside those persistent volumes in the long-run?
(2) Are there any methods for safely backing up the persistent volumes e.g. on a weekly basis (automatically)?
(3) I am also planning to integrate some kind of file upload into the application. Are persistent volumes capable of storing many GB/TB of data, or should I opt for something like Google cloud storage in this case?
Deploying statefull apps on K8s is bit painfull which is well known in K8s community. Usually, if we need HA for DBs supposed to deploy as cluster mode. But in K8s, if you want to deploy in cluster mode, you need to check StatefulSets concept. Anyways, I'm pasting links for your questions, so that you can start from there.
(1) Thus I am wondering if it is practical to store all data inside
those persistent volumes in the long-run?
Running MongoDB on Kubernetes with StatefulSets
(2) Are there any methods for safely backing up the persistent volumes
e.g. on a weekly basis (automatically)?
Persistent Volume Snapshots
Volume Snapshot (Beta from K8s docs)
You can google even more docs.
(3) I am also planning to integrate some kind of file upload into the
application. Are persistent volumes capable of storing many GB/TB of
data, or should I opt for something like Google cloud storage in this
case?
Not sure, it can hold TBs!?? but definitely, if you have cloud, consider to use it
Yes you can use the PVC in Kubernetes to store the data. However it's depends on your application usecase and size.
In kubernetes you can deploy Mongo DB as cluster and run it which is storing data inside PVC.MongoDB helm chart available for HA you can also look for that.
Helm chart : https://github.com/helm/charts/tree/master/stable/mongodb
It's suggested to single pod or statefulset of MongoDB on Kubernetes.
Backup:
For backup of MongoDB database, you can choose taking a snapshot of disk storage (PVC) weekly however along with that you can alos use Mongo snapshot.
Most people choose to manage service but still, it depends on your organization also.
Backup method
MongoDB snapshot
Disk storage snapshot
Filesystem :
Yes it can handle TB of data as it's ultimately disk volume or file
system.
Yes you can use PVC as file system but later in future you may get issue for scaling as PVC is ReadWriteOnce if you want to scale application along with PVC you have to implement ReadWriteMany.
There is sevral method also to achive this you can also directly mount file system to pod like AWS EFS but you may find it slow for file operations.
For file system there are various options available in Kubernetes like csi driver, gluster FS, minio, EFS.

Recover a Kubernetes Cluster

At the moment I have a Kubernetes cluster distributed on AWS via kops. I have a doubt: is it possible to make a sort of snapshot of the Kubernetes cluster and recreate the same environment (master and pod nodes), for example to be resilient or to migrate the cluster in an easy way? I know that the Heptio Ark exists, it is very beautiful. But I'm curious to know if there is an easy way to do it. For example, is it enough to back up Etcd (or in my case the snapshot of EBS volumes)?
Thanks a lot. All suggestions are welcome
kops stores its state in an S3 bucket identified by the KOPS_STATE_STORE. So yes, if your cluster has been removed you can restore it by running kops create cluster.
Keep in mind that it doesn't restore your etcd state so for that you are going to set up etcd backups. You could also make use of Heptio Ark.
Similar answers to this topic:
Recover kops Kubernetes cluster
How to restore kubernetes cluster using kops?
As mentioned by Rico in the earlier post, you can use Velero to back up your etcd using cli client. Another option to consider for the scenario you described is CAPE: CAPE provides an easy to use control plane for Kubernetes Multi-cluster App & Data Management via a friendly user interface.
See below for resources:
How to create an on-demand K8s Backup:
https://www.youtube.com/watch?v=MOPtRTeG8sw&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=7
How to Restore/Migrate K8s Backup to Another Cluster:
https://www.youtube.com/watch?v=dhBnUgfTsh4&list=PLByzHLEsOQEB01EIybmgfcrBMO6WNFYZL&index=10

Kubernetes - Persistent storage for PostgreSQL

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods