How to backup large mongodb instance hosted on kubernetes? - mongodb

We have a mongo instance which is hosted on Kubernetes as Statefulset.
It's of around 3.5 TB size with persistent volumes attached.
We are looking for a way reduce backup time. What's the best way to backup and restore mongodb instance to and from AWS S3.
I've looked at Physical backup and Logical backup options using PBM. But not sure if they're suitable for instances deployed as statefulsets in k8s as they're of TBs size.

Related

Standalone MongoDB installation for Production

I want to deploy MongoDB to Kubernetes cluster with 2 nodes, there is no chance to add another node in the future.
I want to deploy MongoDB as standalone because both node will be able to access to same disk space via NFS and I don't have requirements for replication or high availability. However, in the MongoDB docs, it is clearly stated that standalone deployment is not suitable for production environment.
MongoDB Deploy Standalone
You can deploy a standalone MongoDB instance for Cloud Manager to manage. Use standalone instances for testing and development. Do not use these deployments for production systems as they lack replication and high availability.
What kind of drawbacks I can face? Should I deploy as replica set with arbiter instance? If yes, why?
Of course you can deploy a Standalone MongoDB for production. But if this node fails, then your application is not available anymore. If you don't have any requirement for availability then go for a Standalone MongoDB.
However, running 2 MongoDB services which access the same physical disk (i.e. dbPath) will not work. Each MongoDB instance need to have a dedicated data folder.
In your case, I would suggest a Replica Set. All data from one node will be replicated to the other one. If one node fails then the application goes into "read/only" mode.
You can deploy an arbiter instance on the primary node. If the secondary node goes down, then the application is still fully available.
It is always recommended to deploy as replicaSet for production , however if you deploy as standalone and you have 2x kubernetes nodes , kubernetes can ensure there is always 1x running instance attached to the NFS storage in any of the available nodes , but the risk is that when the data on the storage is corrupted you will not have where to replicate from unless you do often backups and you dont care if you miss some recenly inserted data ...

HA postgresql on kubernetes

I wanted to deploy postgresql as database in my kubernetes cluster. As of now I've followed this tutorial.
By reading the whole thing I understood that we claimed a static storage before initiating the postgresql so that we have the data in case the pod fails. Also we can do replication by pointing to the same storage space to get our data back.
What happens if we use two workers nodes and the pods containing the database migrate to another node? I don’t think local storage will work.
hostPath volume is not recommended for production usage because of its ephemeral nature which means if the pod is rescheduled to another node the storage is not migrated and if the node reboots the data is lost.
For durable storage use external block or file storage systems mounted on the nodes using a supported CSI driver
For HA postgres I suggest you explore Postgres Operator which delivers an easy to run highly-available PostgreSQL clusters on Kubernetes (K8s) powered by Patroni. It is configured only through Postgres manifests (CRDs) to ease integration into automated CI/CD pipelines with no access to Kubernetes API directly, promoting infrastructure as code vs manual operations

Is it Appropriate to Store Database in a Kubernetes Persistent Volume (And how to back up?)

I have a web application running on a Google Kubernetes cluster. My web app also uses persistent volumes for multiple MongoDB databases to store user and application data.
(1) Thus I am wondering if it is practical to store all data inside those persistent volumes in the long-run?
(2) Are there any methods for safely backing up the persistent volumes e.g. on a weekly basis (automatically)?
(3) I am also planning to integrate some kind of file upload into the application. Are persistent volumes capable of storing many GB/TB of data, or should I opt for something like Google cloud storage in this case?
Deploying statefull apps on K8s is bit painfull which is well known in K8s community. Usually, if we need HA for DBs supposed to deploy as cluster mode. But in K8s, if you want to deploy in cluster mode, you need to check StatefulSets concept. Anyways, I'm pasting links for your questions, so that you can start from there.
(1) Thus I am wondering if it is practical to store all data inside
those persistent volumes in the long-run?
Running MongoDB on Kubernetes with StatefulSets
(2) Are there any methods for safely backing up the persistent volumes
e.g. on a weekly basis (automatically)?
Persistent Volume Snapshots
Volume Snapshot (Beta from K8s docs)
You can google even more docs.
(3) I am also planning to integrate some kind of file upload into the
application. Are persistent volumes capable of storing many GB/TB of
data, or should I opt for something like Google cloud storage in this
case?
Not sure, it can hold TBs!?? but definitely, if you have cloud, consider to use it
Yes you can use the PVC in Kubernetes to store the data. However it's depends on your application usecase and size.
In kubernetes you can deploy Mongo DB as cluster and run it which is storing data inside PVC.MongoDB helm chart available for HA you can also look for that.
Helm chart : https://github.com/helm/charts/tree/master/stable/mongodb
It's suggested to single pod or statefulset of MongoDB on Kubernetes.
Backup:
For backup of MongoDB database, you can choose taking a snapshot of disk storage (PVC) weekly however along with that you can alos use Mongo snapshot.
Most people choose to manage service but still, it depends on your organization also.
Backup method
MongoDB snapshot
Disk storage snapshot
Filesystem :
Yes it can handle TB of data as it's ultimately disk volume or file
system.
Yes you can use PVC as file system but later in future you may get issue for scaling as PVC is ReadWriteOnce if you want to scale application along with PVC you have to implement ReadWriteMany.
There is sevral method also to achive this you can also directly mount file system to pod like AWS EFS but you may find it slow for file operations.
For file system there are various options available in Kubernetes like csi driver, gluster FS, minio, EFS.

How do i run a HA MongoDB in my kubernetes cluster without Portworx?

I want to have a MongoDB deployment as a service to my database per service type microservice architecture model.
Right now I am using helm packages to deploy mongo db by defining persistent volume and persistent volume claims.
But I want to deploy mongodb as HA with storing data in any EBS or so!
When I checked online for this solution everything suggests it with Portworx. But is there a way to do it without using Portworx?
Any help appreciated.

Kubernetes - Persistent storage for PostgreSQL

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods