Clear MongoDB cache in a sharded cluster - mongodb

I have a self-hosted MongoDB deployment on an AWS EKS cluster, version 1.24.
Every time I put some workload on the cluster, the MongoDB shards eat most of RAM of the node. I'm running on t3.medium instances, and every shard uses ~2GB. Since there are multiple shards on each node, it just fills the memory, and the node becomes unavailable.
I've tried limiting the WiredTiger cache size to 0.25GB, but it doesn't seem to work.
I've also tried manually clearing the cache with db.collection.getPlanCache().clear, but it's doing nothing.
db.collection.getPlanCache().list() returns an empty array.
I've also tried checking the storage engine. But bothdb.serverStatus().wiredTiger and db.serverStatus().storageEngine are undefined in the mongoshell.
I'm using the bitnami mongodb-sharded chart, with the current values:
mongodb-sharded:
shards: 8
shardsvr:
persistence:
resourcePolicy: "keep"
enabled: true
size: 100Gi
configsvr:
replicaCount: 2
mongos:
replicaCount: 2
configCM: mongos-configmap
The mongos config map is this one
apiVersion: v1
kind: ConfigMap
metadata:
name: mongos-configmap
data:
mongo.conf: |
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 0.25
inMemory:
engineConfig:
inMemorySizeGB: 0.25

Solved the various issues:
I had typo in the ConfigMap -> mongo.conf instead of mongos.conf. This meant that it was creating a different unused config file.
mongos are not the ones with the storage engine: that's on the mongod (the shards). So the config should be put in shardsvr.dataNode.configCM
Setting a custom config means that I'm overwriting the default one deployed by bitnami -> you would need to copy it all, and then modify what you need. A much better option would be to just add flags at shardsvr.dataNode.mongodbExtraFlags
In my case this is how I setup the values.yaml
shardsvr:
dataNode:
mongodbExtraFlags:
- "--wiredTigerCacheSizeGB .3"
Another note: the reason db.serverStatus().storageEngine and db.serverStatus().wiredTiger were undefined was that I was running the mongosh from MongoDBCompass, which acutally connects to the mongos (which does not have a storage engine)
If instead you shell into one of the shards, and run mongosh (in my case it's at /opt/bitnami/monogdb/bin/) the commands work properly.

Related

Redis blocks GKE scaledown with reason: no.scale.down.node.pod.has.local.storage

I have a GKE Cluster current scaled upto multiple nodes, the scale up happened during high load due to a DDOS attack on our services , but now the cluster is unable to scale down due to redis-master and redis-slave, this is ending up costing alot of overhead costs that has become an issue for us now.
The autoscale down shows the error: no.scale.down.node.pod.has.local.storage. I have seen in multiple answers setting the option cluster-autoscaler.kubernetes.io/safe-to-evict to true should solve the issue (GCP Suggests the same) but before i do that i want to know if doing so and scaling down redis-slave instances can cause any data loss? Any suggestion for this would be ideal as currently we are paying over 2x of what is needed.
I also checked the config as well , and i saw there are volumes the redis-master and redis-slaves under volumes and the config yaml also has that as :
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
app: redis
component: master
heritage: Tiller
release: prod-redis
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
Redis master or slave mostly save the data in to memory for backup and restore purpose it takes snapshot every min or second based on config.
You can checkout the deployment config or configmap if any snapshot option running or not. In Redis terms known as the AOF and RDB backup.
Whenver any Redis POD crash in K8s it starts and restores the data from the above file if present.
So make sure you exec into the POD and check files are present or not.
image: redislabs/redis
args: ["--requirepass", "admin", "--appendonly", "yes", "--save", "900", "1", "--save", "30", "2"]
Check the config or YAML of deployment might be having some options like above.
Ref Document : https://redis.io/topics/persistence
If files are not there you can manually take the backup so if POD crashes it start with old data.
Command to generate the backup in background: BG SAVE
if you have large data in POD things might go wrong, BG SAVE will kick the process to save the data to the filesystem in PVC which can lead to Higher resource requirements and POD will get killed if resource set.
Once data is saved using the background command you can start removing the slaves with their respective PVCs.
So if AOF and RDB are disabled data loss will be there.
Just RDB is also not a good option as it takes periodic backup while AOF is an instant option.
If just RDB there, could be chances it has taken snapshot by night and you remove POD in the morning so lastest data after snapshot is in memory you might not get in snapshot.

Strimzi Kafka Using local Node Storage

i am running kafka on kubernetes (deployed on Azure) using strimzi for development environment and would prefer to use internal kubernetes node storage. if i use persistant-claim or jbod, it creates standard disks on azure storage. however i prefer to use internal node storage as i have 16 gb available there. i do not want to use ephemeral as i want the data to be persisted atleast on kubernetes nodes.
folllowing is my deployment.yml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
type: loadbalancer
tls: false
port: 9094
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: persistent-claim
size : 2Gi
deleteClaim: false
zookeeper:
replicas: 2
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
The persistent-claim storage as you use it will provision the storage using the default storage class which in your case I guess creates standard storage.
You have two options how to use local disk space of the worker node:
You can use the ephemeral type storage. But keep in mind that this is like a temporary directory, it will be lost in every rolling update. Also if you for example delete all the pods at the same time, you will loose all data. As such it is something recommended only for some short-lived clusters in CI, maybe some short development etc. But for sure not for anything where you need reliability.
You can use Local Persistent Volumes which are persistent volumes which are bound to a particular node. These are persistent, so the pods will re-use the volume between restarts and rolling udpates. However, it bounds the pod to the particular worker node the storage is on -> so you cannot easily reschedule it to another worker node. But apart from these limitation, it is something what can be (unlike the ephemeral storage) used with reliability and availability when done right. The local persistent volumes are normally provisioned through StorageClass as well -> so in the Kafka custom resource in Strimzi it will still use the persistent-claim type storage, just with different storage class.
You should really thing what exactly you want to use and why. From my experience, the local persistent volumes are great option when
You run on bare metal / on-premise clusters where often good shared block storage is not available
When you require maximum performance (local storage does not depend on network, so it can be often faster)
But in public clouds with good support for high quality for networked block storage such as Amazon EBS volumes and their Azure or Google counterparts, local storage often brings more problems than advantages because of how it bounds your Kafka brokers to a particular worker node.
Some more details about the local persistent volumes can be found here: https://kubernetes.io/docs/concepts/storage/volumes/#local ... there are also different provisioners which can help you use it. I'm not sure if Azure supports anything out of the box.
Sidenote: 2Gi of space is very small for Kafka. Not sure how much you will be able to do before running out of disk space. Even 16Gi would be quite small. If you know what are you doing, then fine. But if not, you should be careful.

how to migrate VM data from a disk to kubernetes cluster

how to migrate VM data from a disk to a Kubernetes cluster?
I have a VM with three disks attached mounted to it, each having data that needs to be migrated to a Kubernetes cluster to be attached to database services in statefulset.
This link https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/preexisting-pd does give me the way. But don't know how to use it or implement it with statefulsets so that one particular database resource (like Postgres) be able to use the same PV(created from one of those GCE persistent disks) and create multiple PVCs for new replicas.
Is the scenario I'm describing achievable?
If yes how to implement it?
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 6.8.12
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
xpack.security.enabled: false
xpack.security.authc:
anonymous:
username: anonymous
roles: superuser
authz_exception: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: managed-premium
StatefulSet is a preference for manually creating a PersistentVolumeClaim. For database workloads that can't be replicated, you can't set replicas: greater than 1 in either case, but the PVC management is valuable. You usually can't have multiple databases pointing at the same physical storage, containers or otherwise, and most types of Volumes can't be shared across Pods.
Postgres doesn't support multiple instances sharing the underlying volume without massive corruption so if you did set things up that way, it's definitely a mistake. More common would be to use the volumeClaimTemplate system so each pod gets its own distinct storage. Then you set up Postgres streaming replication yourself.
Refer to the document on Run a replicated stateful application and stackpost for more information.

Mongodb container fails if have a volume setup and more than one container in workload

I have setup a mongodb workload in Rancher (2.5.8)
I have setup a volume:
The workload start fine if I have the containers set to scale to 1. So 1 container will start and all is fine.
However if I set the workload to have 2 or more containers, one container will start fine, but then the others fail to start.
Here is what my workload looks like if I set it to scale to 2. one container started and running fine, but the second (and third if I have its scale to 3) are failing.
If I remove the volume, then 2+ containers will all start up fine, but then data is only being stored within each container (and gets lost whenever I redeploy).
But if I have the volume set, then the data does store in the volume (host), but then can only start one container.
Thank you in advance for any suggestions
Jason
Posting this community wiki answer to set a baseline and to hopefully show one possible reason that the mongodb is failing.
Feel free to edit/expand.
As there is a lot of information missing from this question like how it was created, how the mongodb was provisioned and there is also lack of logs from the container, the actual issue could be hard to pinpoint.
Assuming that the Deployment was created with a following manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongo
spec:
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
replicas: 1 # THEN SCALE TO 3
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- name: mongo
image: mongo
imagePullPolicy: "Always"
ports:
- containerPort: 27017
volumeMounts:
- mountPath: /data/db
name: mongodb
volumes:
- name: mongodb
persistentVolumeClaim:
claimName: mongo-pvc
The part of the setup where the Volume is referenced could be different (for example hostPath can be used) but the premise of it was:
If the Pods are physically referencing the same data/db/mongod it will go into CrashLoopBackOff state.
Following on this topic:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mongo-5d849bfd8f-8s26t 1/1 Running 0 45m
mongo-5d849bfd8f-l6dzb 0/1 CrashLoopBackOff 13 44m
mongo-5d849bfd8f-wgh6m 0/1 CrashLoopBackOff 13 44m
$ kubectl logs mongo-5d849bfd8f-l6dzb
<-- REDACTED -->
{"t":{"$date":"2021-06-05T12:43:58.025+00:00"},"s":"E", "c":"STORAGE", "id":20557, "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory"}}
<-- REDACTED -->
Citing the O`Reilly site on the mongodb production setup:
Specify an alternate directory to use as the data directory; the default is /data/db/ (or, on Windows, \data\db\ on the MongoDB binary’s volume). Each mongod process on a machine needs its own data directory, so if you are running three instances of mongod on one machine, you’ll need three separate data directories. When mongod starts up, it creates a mongod.lock file in its data directory, which prevents any other mongod process from using that directory. If you attempt to start another MongoDB server using the same data directory, it will give an error:
exception in initAndListen: DBPathInUse: Unable to lock the
lock file: \ data/db/mongod.lock (Resource temporarily unavailable).
Another mongod instance is already running on the
data/db directory,
\ terminating`
-- Oreilly.com: Library: View: Mongodb the definitive: Chapter 21
As an alternative approach you can other means to provision mongodb like for example:
Docs.mongodb.com: Kubernetes operator: Master: Tutorial: Deploy replica set (I would check the configuration of StorageClasses here)
Bitname.com: Stack: Mongodb: Helm

Percona XtraDB Cluster Operator - mount additional storage (INFILE)

We have Percona XtraDB Cluster Operator setup on Kubernetes. In main configuration of cluster we have set persistentVolumeClaim option for pxc and proxysql.
This is the query, we would like to execute on our Percona Cluster:
LOAD DATA LOCAL INFILE '/cloud/percona-data/test.csv'
INTO TABLE testTable
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
IGNORE 1 LINES
(id, testcolumn);
File '/cloud/percona-data/test.csv' has to be available via local storage.
We tried hostPath option, but it seems to be not active, because persistentVolume is configured (is that true or my configuration is not valid?).
This is the part of cluster configuration:
volumeSpec:
hostPath:
path: /cloud/percona-data
type: Directory
persistentVolumeClaim:
resources:
requests:
storage: 6Gi
Is there any way to mount additional storage to all pxc and proxysql pods?
Install guide
Percona XtraDB Cluster
Configuration files:
cr.yaml
operator.yaml
Thanks all.