MongoDB terminated at with exit code 14(error) in Kubernetes many times - mongodb

I have created 3 mongoDB replicaSet by helm chart from bitnami and mount the data to my NAS in k8s.
However after several restart(due to modify some config), my mongoDB become unhealthy and end up terminating many times.
The last state shows: 'Terminated at Feb 25,2022 11:20:04 AM with exit code 14(error)'
I think maybe I should clean the data in NAS and restart mongoDB again.
But I really want to know what cause this happen and how to solve this error.
Has anyone ever had the same issue?
Thanks.

Related

Ceph Mgr not responding to certain commands

We have a ceph cluster built with rook(2 mgrs, 3 mons, 2 mds each cephfs, 24 osds, rook: 1.9.3, ceph: 16.2.7, kubelet: 1.24.1). Our operation requires constantly creating and deleting cephfilesystems. Overtime we experienced issues with rook-ceph-mgr. After the cluster was built, in a week or two, rook-ceph-mgr failed to respond to certain ceph commands, like ceph osd pool autoscale-status, ceph fs subvolumegroup ls, while other commands, like ceph -s, worked fine. We have to restart rook-ceph-mgr to get it going. Now we have around 30 cephfilesystems and the issue happens more frequently.
We tried disabling mgr modules dashboard, prometheus and iostat, set ceph progress off, increased mgr_stats_period & mon_mgr_digest_period. That didn't help much. The issue happened again after one or two creating & deleting cycles.

How to start a POD in Kubernetes when another blocks an important resource?

I'm getting stuck in the configuration of a deployment. The problem is the following.
The application in the deployment is using a database which is stored in a file. While this database is open, it's locked (there's no way for read/write access for many).
If I delete the running POD the new one can't get in ready state, because the database is still locked. I read about preStop-Hook and tried to use it without success.
I could delete the lock file, which seems to be pretty harsh. What's the right way to solve this in Kubernetes?
This really isn't different than running this process outside of Kubernetes. When the pod is killed, it will be given a chance to shutdown cleanly. So the lock should be cleaned up. If the lock isn't cleaned up, there's not a lot of ways you can determined if the lock remains because an unclean shutdown was made, or a node is unhealthy, or if there is a network partition. So deleting the lock at pod startup does seem to be unwise.
I think the first step for you should be trying to determine why this lock file isn't getting cleaned up correctly. (Rather than trying to address the symptom.)

Liquibase causing PostgreSQL DB Lock - Microservices running as a pod in AKS

We are multiple microservices(With Liquibase) running on Azure AKS cluster as a pod.
Frequently we have noticed DB locks and pods will crash as it will fail in health checks.
Is there a way to overcome this scenario as it is impacting a lot. We have to manually unlock DB table, so that pod will start.
In one of the logs, I’ve noticed below error
I believe, it needs to be handled from Application(Springboot).
You can write a piece of code that executes at the start of application that will release the lock if found. Then the database connection won't fail.
Currently using the same for our environment.

MongoDB NonDocker and Docker Nodes

I have a 5 node MongoDB cluster installed non-Dockerized. I want to start adding nodes to this cluster but I want to use the Dockerized MongoDB (i.e. end result is to migrate Dockerized into the replica set and decommission the non-Dockerized nodes.)
When I do this, I am currently getting my added nodes stuck in STARTUP status so from my understanding the config files are not able to sync up.
Is there something that I need to do to prepare the cluster for the new nodes or is there some logs that I can delve into to find out why it is not moving to STARTUP2?
The data directory was not large enough thus the config files were unable to sync. As soon as I grew the data directory - all was well.

MongoDB writeback exception

Our mongodb cluster in production, is a sharded cluster with 3 replica sets with 3 server each one and, of course, another 3 config servers.
We also have 14 webservers that connect directly to mongoDb throw the mongos process that are running in each of this webservers (clients).
The entire cluster receive 5000 inserts per minute.
Sometimes, we start getting exceptions from our java applications when it wants to perform operations to the mongoDb.
This is the stackTrace:
caused by com.mongodb.MongoException: writeback
com.mongodb.CommandResult.getException(CommandResult.java:100)
com.mongodb.CommandResult.throwOnError(CommandResult.java:134)
com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:142)
com.mongodb.DBTCPConnector.say(DBTCPConnector.java:183)
com.mongodb.DBTCPConnector.say(DBTCPConnector.java:155)
com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:270)
com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:226)
com.mongodb.DBCollection.insert(DBCollection.java:147)
com.mongodb.DBCollection.insert(DBCollection.java:90)
com.mongodb.DBCollection$insert$0.call(Unknown Source)
If I check the mongos process throw the rest _status command that it provides, it returns a 200OK. We could fix the problem restarting the tomcat that we are using and restarting the mongos process but I would like to find a final solution to this problem. It's not a happy solution to have to restart everything in the middle of the night.
When this error happens, maybe 2 or 3 another webservers got the same error at the same time, so I imagine that there is a problem in the entire mongoDb cluster, no a problem in a single isolated webserver.
Does anyone know why mongo returns a writeback error? and how to fix it?
I'm using mongoDb 2.2.0.
Thanks in advance.
Fer
I believe you are seeing the Writeback error "leak" into the getLastError output and then continue to be reported even when the operation in question had not errored. This was an issue in the earlier versions of MongoDB 2.2, and has since been fixed, see:
https://jira.mongodb.org/browse/SERVER-7958
https://jira.mongodb.org/browse/SERVER-7369
https://jira.mongodb.org/browse/SERVER-4532
As of writing this answer, I would recommend 2.2.4, but basically whatever the latest 2.2 branch is, to resolve your problem.