Kubenetes Mongo Deployment Loses Data After Pod Restart - mongodb

Recently, the managed pod in my mongo deployment onto GKE was automatically deleted and a new one was created in its place. As a result, all my db data was lost.
I specified a PV for the deployment and the PVC was bound too, and I used the standard storage class (google persistent disk). The Persistent Volume Claim had not been deleted either.
Here's an image of the result from kubectl get pv:
pvc
My mongo deployment along with the persistent volume claim and service deployment were all created by using kubernets' kompose tool from a docker-compose.yml for a prisma 1 + mongodb deployment.
Here are my yamls:
mongo-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose -f docker-compose.yml convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: mongo
name: mongo
namespace: dbmode
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: mongo
strategy:
type: Recreate
template:
metadata:
annotations:
kompose.cmd: kompose -f docker-compose.yml convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: mongo
spec:
containers:
- env:
- name: MONGO_INITDB_ROOT_PASSWORD
value: prisma
- name: MONGO_INITDB_ROOT_USERNAME
value: prisma
image: mongo:3.6
imagePullPolicy: ""
name: mongo
ports:
- containerPort: 27017
resources: {}
volumeMounts:
- mountPath: /var/lib/mongo
name: mongo
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: mongo
persistentVolumeClaim:
claimName: mongo
status: {}
mongo-persistentvolumeclaim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
io.kompose.service: mongo
name: mongo
namespace: dbmode
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
mongo-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose -f docker-compose.yml convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: mongo
name: mongo
namespace: dbmode
spec:
ports:
- name: "27017"
port: 27017
targetPort: 27017
selector:
io.kompose.service: mongo
status:
loadBalancer: {}
I've tried checking the contents mounted in /var/lib/mongo and all I got was an empty lost+found/ folder, and I've tried to search the Google Persistent Disks but there was nothing in the root directory and I didn't know where else to look.
I guess that for some reason the mongo deployment is not pulling from the persistent volume for the old data when it starts a new pod, which is extremely perplexing.
I also have another kubernetes project where the same thing happened, except that the old pod still showed but had an evicted status.

I've tried checking the contents mounted in /var/lib/mongo and all I
got was an empty lost+found/ folder,
OK, but have you checked it was actually saving data there, before the Pod restart and data loss ? I guess it was never saving any data in that directory.
I checked the image you used by running a simple Pod:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-pod
image: mongo:3.6
When you connect to it by running:
kubectl exec -ti my-pod -- /bin/bash
and check the default mongo configuration file:
root#my-pod:/var/lib# cat /etc/mongod.conf.orig
# mongod.conf
# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/
# Where and how to store data.
storage:
dbPath: /var/lib/mongodb # 👈
journal:
enabled: true
# engine:
# mmapv1:
# wiredTiger:
you can see among other things that dbPath is actually set to /var/lib/mongodb and NOT to /var/lib/mongo.
So chances are that your mongo wasn't actually saving any data to your PV i.e. to /var/lib/mongo directory, where it was mounted, but to /var/lib/mongodb as stated in its configuration file.
You should be able to check it easily by kubectl exec to your running mongo pod:
kubectl exec -ti <mongo-pod-name> -- /bin/bash
and verify where the data is saved.
If you didn't overwrite in any way the original config file (e.g. by providing a ConfigMap), mongo should save its data to /var/lib/mongodb and this directory, not being a mount point for your volume, is part of a Pod filesystem and its ephemeral.
Update:
The above mentioned /etc/mongod.conf.orig is only a template so it doesn't reflect the actual configuration that has been applied.
If you run:
kubectl logs your-mongo-pod
it will show where the data directory is located:
$ kubectl logs my-pod
2020-12-16T22:20:47.472+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=my-pod
2020-12-16T22:20:47.473+0000 I CONTROL [initandlisten] db version v3.6.21
...
As we can see, data is saved in /data/db:
dbpath=/data/db

Related

What is the correct and secure way to run a single instance mongo in kubernetes with Persistent volumes?

I am trying to deploy a single instance mongodb inside of a kubernetes cluster (RKE2 specifically) on an AWS ec2 instance running Redhat 8.5. I am just trying to use the local file system i.e. no EBS. I am having trouble getting my application to work with persistent volumes so I have a few questions. Below is my pv.yaml
kind: Namespace
apiVersion: v1
metadata:
name: mongo
labels:
name: mongo
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongodb-pv
namespace: mongo
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/ec2-user/database"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-pvc
namespace: mongo
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
And here is my mongo deployment (I know having the user/password in plain text is not secure but this is for the sake of the example)
apiVersion: v1
kind: Pod
metadata:
name: mongodb-pod
namespace: mongo
labels:
app.kubernetes.io/name: mongodb-pod
spec:
containers:
- name: mongo
image: mongo:latest
imagePullPolicy: Always
ports:
- containerPort: 27017
name: mongodb-cp
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: "user"
- name: MONGO_INITDB_ROOT_PASSWORD
value: "password"
volumeMounts:
- mountPath: /data/db
name: mongodb-storage
volumes:
- name: mongodb-storage
persistentVolumeClaim:
claimName: mongodb-pvc
---
apiVersion: v1
kind: Service
metadata:
name: mongodb
namespace: mongo
spec:
selector:
app.kubernetes.io/name: mongodb-pod
ports:
- name: mongodb-cp
port: 27017
targetPort: mongodb-cp
When I run the above configuration files, I get the following errors from the mongo pod:
find: '/data/db': Permission denied
chown: changing ownership of '/data/db': Permission denied
I tried creating a mongodb user on the host with a uid and gid of 1001 since that is the process owner inside the mongo container and chowning the hostPath mentioned above but the error persists.
I have tried adding a securityContext block at both the pod and container level like so:
securityContext:
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
which does get me further, but I now get the following error:
{"t":{"$date":"2022-06-02T20:32:13.015+00:00"},"s":"E", "c":"CONTROL", "id":20557, "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"IllegalOperation: Attempted to create a lock file on a read-only directory: /data/db"}}
and then the pod dies. If I set the container securityContext to privileged
securityContext:
privileged: true
Everything runs fine. So the two questions are.. is it secure to run a pod as privileged? If not (which is my assumption), what is the correct and secure way to use persistent volumes with the above configurations/example?

My Kubernetes MongoDB service isn't persisting the data

This is the mongodb yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth-mongo-depl
spec:
replicas: 1
selector:
matchLabels:
app: auth-mongo
template:
metadata:
labels:
app: auth-mongo
spec:
containers:
- name: auth-mongo
image: mongo
volumeMounts:
- mountPath: "/data/db/auth"
name: auth-db-storage
volumes:
- name: auth-db-storage
persistentVolumeClaim:
claimName: mongo-pvc
---
apiVersion: v1
kind: Service
metadata:
name: auth-mongo-srv
spec:
selector:
app: auth-mongo
ports:
- name: db
protocol: TCP
port: 27017
targetPort: 27017
And this is the persistent volume file.
apiVersion: v1
kind: PersistentVolume
metadata:
name: mongo-pv
labels:
type: local
spec:
storageClassName: mongo
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/db"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
storageClassName: mongo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
I'm running this on Ubuntu using kubectl and minikube v1.25.1.
When I run describe pod, I see this on the mongodb pod.
Volumes:
auth-db-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongo-pvc
ReadOnly: false
I have a similar setup for other pods to store files, and it's working fine. But with mongodb, every time I restart the pods, the data is lost. Can someone help me?
EDIT: I noticed that if I change the mongodb mountPath to /data/db, it works fine. But if I have multiple mongodb pods running on /data/db, they don't work. So I need to have one persistent volume claim for EACH mongodb pod?
When using these yaml files, you are mounting the /data/db dir on the minikube node to /data/db/auth in auth-mongo pods.
First, you should change /data/db/auth to /data/db in your k8s deployment so that your mongodb can read the database from the default db location.
Even if you delete the deployment, the db will stay in '/data/db' dir on the minikube node. And after running the new pod from this deployment, mongodb will open this existing db (all data saved).
Second, you can't use multiple mongodb pods like this by just scaling replicas in the deployment because the second mongodb in other Pod can't use already used by the first Pod db. Mongodb will throw this error:
Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory
So, the solution is either to use only 1 replica in your deployment or, for example, use MongoDB packaged by Bitnami helm chart.
https://github.com/bitnami/charts/tree/master/bitnami/mongodb
This chart bootstraps a MongoDB(®) deployment on a Kubernetes cluster using the Helm package manager.
$ helm install my-release bitnami/mongodb --set architecture=replicaset --set replicaCount=2
Understand MongoDB Architecture Options.
Also, check this link MongoDB Community Kubernetes Operator.
This is a Kubernetes Operator which deploys MongoDB Community into Kubernetes clusters.

Accessing Postgresql data of Kubernetes cluster

I have kubernetes cluster with two replicas of a PostgreSQL database in it, and I wanted to see the values stored in the database.
When I exec myself into one of the two postgres pod (kubectl exec --stdin --tty [postgres_pod] -- /bin/bash) and check the database from within, I have only a partial part of the DB. The rest of the DB data is on the other Postgres pod, and I don't see any directory created by the persistent volumes with all the database stored.
So in short I create 4 tables; in one postgres pod I have 4 tables but 2 are empty, in the other postgres pod there are 3 tables and the tables that were empty in the first pod, here are filled with data.
Why the pods don't have the same data in it?
How can I access and download the entire database?
PS. I deploy the cluster using HELM in minikube.
Here are the YAML files:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
labels:
app: postgres
data:
POSTGRES_DB: database-pg
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
PGDATA: /data/pgdata
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: postgres-pv-volume
labels:
type: local
app: postgres
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/mnt/data"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: postgres-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- name: postgres
port: 5432
nodePort: 30432
type: NodePort
selector:
app: postgres
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-service
selector:
matchLabels:
app: postgres
replicas: 2
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13.2
volumeMounts:
- name: postgres-disk
mountPath: /data
# Config from ConfigMap
envFrom:
- configMapRef:
name: postgres-config
volumeClaimTemplates:
- metadata:
name: postgres-disk
spec:
accessModes: ["ReadWriteOnce"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
labels:
app: postgres
spec:
selector:
matchLabels:
app: postgres
replicas: 2
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13.2
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: postgres-config
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgredb
volumes:
- name: postgredb
persistentVolumeClaim:
claimName: postgres-pv-claim
---
I found a solution to my problem of downloading the volume directory, however when I run multiple replicasets of postgres, the tables of the DB are still scattered between the pods.
Here's what I did to download the postgres volume:
First of all, minikube supports some specific directories for volume appear:
minikube is configured to persist files stored under the following directories, which are made in the Minikube VM (or on your localhost if running on bare metal). You may lose data from other directories on reboots.
/data
/var/lib/minikube
/var/lib/docker
/tmp/hostpath_pv
/tmp/hostpath-provisioner
So I've changed the mount path to be under the /data directory. This made the database volume visible.
After this I ssh'ed into minikube and copied the database volume to a new directory (I used /home/docker as the user of minikube is docker).
sudo cp -R /data/pgdata /home/docker
The volume pgdata was still owned by root (access denied error) so I changed it to be owned by docker. For this I also set a new password which I knew:
sudo passwd docker # change password for docker user
sudo chown -R docker: /home/docker/pgdata # change owner from root to docker
Then you can exit and copy the directory into you local machine:
exit
scp -r $(minikube ssh-key) docker#$(minikube ip):/home/docker/pgdata [your_local_path].
NOTE
Mario's advice, which is to use pgdump is probably a better solution to copy a database. I still wanted to download the volume directory to see if it has the full database, when the pods have only a part of all the tables. In the end it turned out it doesn't.

How to have data persist in GKE kubernetes StatefulSet with postgres?

So I'm just trying to get a web app running on GKE experimentally to familiarize myself with Kubernetes and GKE.
I have a statefulSet (Postgres) with a persistent volume/ persistent volume claim which is mounted to the Postgres pod as expected. The problem I'm having is having the Postgres data endure. If I mount the PV at var/lib/postgres the data gets overridden with each pod update. If I mount at var/lib/postgres/data I get the warning:
initdb: directory "/var/lib/postgresql/data" exists but is not empty
It contains a lost+found directory, perhaps due to it being a mount point.
Using a mount point directly as the data directory is not recommended.
Create a subdirectory under the mount point.
Using Docker alone having the volume mount point at var/lib/postgresql/data works as expected and data endures, but I don't know what to do now in GKE. How does one set this up properly?
Setup file:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sm-pd-volume-claim
spec:
storageClassName: "standard"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1G
---
apiVersion: "apps/v1"
kind: "StatefulSet"
metadata:
name: "postgis-db"
namespace: "default"
labels:
app: "postgis-db"
spec:
serviceName: "postgis-db"
replicas: 1
selector:
matchLabels:
app: "postgis-db"
template:
metadata:
labels:
app: "postgis-db"
spec:
terminationGracePeriodSeconds: 25
containers:
- name: "postgis"
image: "mdillon/postgis"
ports:
- containerPort: 5432
name: postgis-port
volumeMounts:
- name: sm-pd-volume
mountPath: /var/lib/postgresql/data
volumes:
- name: sm-pd-volume
persistentVolumeClaim:
claimName: sm-pd-volume-claim
You are getting this error because the postgres pod has tried to mount the data directory on / folder. It is not recommended to do so.
You have to create subdirectory to resolve this issues on the statefulset manifest yaml files.
volumeMounts:
- name: sm-pd-volume
mountPath: /var/lib/postgresql/data
subPath: data

Kubernetes PVC data persistent

I have a simple kubernetes cluster setup on GKE. To persist the data for my express web app, I have a mongodb deployment, cluster-ip-service for the mongodb deployment and persistent volume claim running in the cluster.
Users data are being stored and everything works fine until I deleted the mongodb deployment on GKE console. When I try to bring the mongodb deployment back with the command:
kubectl apply -f mongodb-deployment.yaml
The mongodb deployment and PVC are running again but all the previous data was lost.
My mongodb deployment yaml file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-persistent-volume-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
My persistent volume claim yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo-deployment
spec:
replicas: 1
selector:
matchLabels:
component: mongo
template:
metadata:
labels:
component: mongo
spec:
volumes:
- name: mongo-storage
persistentVolumeClaim:
claimName: database-persistent-volume-claim
containers:
- name: mongo
image: mongo
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-storage
mountPath: /var/lib/mongo/data
Since the data is be stored in persistent volume which is out of the cluster's lifecycle.
Shouldn't the previous data persist and become available when the database deployment is up and running again?
I think I might be missing something here.
Yes it is possible with the reclaim setting. Please refer this documentation
If you want to preserve data even if PVC can be deleted, change reclaim policy to RETAIN. Then even PVC will be deleted your PV will be marked as RELEASED.