Mongo slow startup after upgrade to 4.2.5 - mongodb

I have upgraded the MongoDB replica set, which consists of 3 members, from 4.0.11 to 4.2.5. After upgrading, startup lasts about 5 minutes. Before upgrading it was instant. It is related to oplog size, because I tested with dropping oplog on new mongo 4.2 and startup was instant.
Max oplog size was 25GB, I decreased it to 5GB and the startup is still slow. Mongo db is on AWS with EBS standard disks. However mongo worked well until this upgrade.
Do you have any idea what can cause slow startup?
I tried with changing following 3 WiredTiger default eviction parameters:
Now mongo is starting immediately. Is it safe to set eviction_dirty_target and eviction_dirty_trigger values like this? Default is : eviction_dirty_target (default 5%) and eviction_dirty_trigger (default 20%). Thanks.

Examine the server logs, they'll say what the database is doing.


Increase in Dirty Cache in Mongodb when we added a new replica set member

Mongo Version: 3.6
Database Engine: Wiredtiger
Old Cluster:
1 Primary
1 Secondary
1 Arbiter
We wanted to replace the secondary with a new machine, so we added a new Replica set. Due to which the percentage of Dirty Cache started increasing.
We did not find anything in the Logs. Can anyone help us figuring out what we did wrong.

mongorestore failing on shard cluster v4.2.x | Error: "panic: close of closed channel"

I had a standalone MongoDB Server v3.4.x where I had several DBs and collections respectively. As the plan was to upgrade to lastest 4.2.x, I have created a mongo dump of all DBs.
Created a shard cluster of config server (replica cluster), shard-1 server (replica cluster) & shard-2 server (cluster) [MongoDB v4.2.x]
Now when I try to restore the dump, it's partially restoring every time I try to restore DBs. If I try to restore single DB it fails with same error. But whenever I try to restore specific DB & specific collection it always works fine. But the problem is so many collections across many DBs. Cannot do it for all indicvidually & every time it fails at different progress percentage/collection/DBs.
2020-02-07719:07:03.822+0000 [#####################...] myproduct_new.chats 68.1MB/74.8MB (91.0%)
2020-02-07719:07:03.851+0000 [########## ] myproduct_new.metaCrashes 216MB/502MB (42.9%)
2020-02-07719:07:03.876+0000 [################## ] myproduct_new.feeds 152MB/196MB (77.4%)
panic: close of closed channel
goroutine 25 [running]:*MongoRestore).RestoreCollectionToDB(Oxc0001a0000, 0xc000234540, Oxc, 0xc00023454d, 900, Ox7fa5503e21f0, 0xc00020b890, 0x1f66e326, Ox0, ...)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/*MongoRestore).RestoreIntent(Oxc0001a0000, Oxc00022f9e0, Ox0, Ox0, Ox0, Ox0)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/*MongoRestore).RestoreIntents.funcl(Oxc0001a0000, 0xc000146420, 0x3)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/ created by*MongoRestore).RestoreIntents
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/ ubuntu#ip-00-xxx-xxx-00:/usr/local/backups/Dev_backup_07-02-2020$ Ox10, Oxc00000f go:503 +0x49b go:311 +Oxbe9 go:126 +Oxlcb go:109
I am connecting to mongos and trying to restore. Currently, sharding is not yet enabled for any DB. Can anyone put some light on whats going wrong or how to restore the dump?
I have got the same problem, then I found out that it is the problem of my mongodb replica set caused this error.
check the rs.status() of your database.
if you got the message
Our replica set config is invalid or we are not a member of it
try this answer
We faced exact same issue for same spec trying to restore from Mongodump. There is no definite reason but could be best to check below factors
Check your Dump size(bjson) vs Allocated Disk free space on Cluster. Dump size could be 2x to 3x of our core Mongo data folder size(Which is compressed on top of BJSON)
Check your Oplog size configured during cluster creation, for first time migration provide 10-15% of free diskspace size as Oplog size, you can change this after migration. This will help secondaries to lag bit longer and catch up on sync faster from WAL. eg: Allocated 3.5 GB for oplog out of Total 75 GB Hardisk size, with 45GB of Total data(compressed). In real world usage scenario(post migration), keep it 1-2 hour write data volumes as oplog size.
Now your total disk space would be Dump folder size + Oplog + 6GB(Default mongo installation + system extras).
Note: If you cannot afford to allocate Dump folder size, you have to run the restore in batches(DBs or Collections nsInclude option), giving time for Mongo to compress after importing bjson. This should be done in minutes
After your restore, Mongo will shrink the data size and diskspace will more or less match close to your Standalone data folder size.
If this is not set and your diskspace is under provisioned during Migration, while your migration is on, Mongo will try to increase diskspace and it cannot do when Primary is used, tries to increase in secondary and make current primary to secondary which could potentially cause above error. You can also check the Hardware/Status Vitals to see whether the servers changed state from Primary to Secondary
Also try NOT to enable Server Auto upsizing while creating cluster, I don't have definite rationale, but we don't want any actions to happen in background to upgrade hardware say M30 to M40 because CPU is busy in middle of Migration(it happened to me)
Finally as good practice, try to run Large Databases mainly with Large single Collection > 4 GB(Non-Shard) separately. I had 40+ dbs, with 20% > 15GB in dump BJSON size and having 1 or 2 big collections with >4GB multi-million docs. Once I separated them, it gave breathing space for Mongo to bulk insert and compress them with some elapsed time of few mins. Mongo restore happens at collection level
So instead of taking 40-50 mins to restore it took 90-120 mins after some mock practices and order.
If you have time to plan, try this out. Any other learning please share
Key takeaways - Check your Dump Folder size and large collection size.
RATE OF DISK WRITES, OPLOG Lag, RAM, CPU, IO are good KPIs to keep watch on during Dumprestore

Mongorestore seems to run out of memory and kills the mongo process

In current setup there are two Mongo Docker containers, running on hosts A and B, with Mongo version of 3.4 and running in a replica set. I would like to upgrade them to 3.6 and increase a member so the containers would run on hosts A, B and C. Containers have 8GB memory limit and no swap allocated (currently), and are administrated in Rancher. So my plan was to boot up the three new containers, initialize a replica set for those, take a dump from the 3.4 container, and restore it the the new replica set master.
Taking the dump went fine, and its size was about 16GB. When I tried to restore it to the new 3.6 master, restoring starts fine, but after it has restored roughly 5GB of the data, mongo process seems to be killed by OS/Rancher, and while the container itself doesn't restart, MongoDB process just crashes and reloads itself back up again. If I run mongorestore to the same database again, it says unique key error for all the already inserted entries and then continue where it left off, only to do the same again after 5GB or so. So it seems that mongorestore loads all the entries it restores to memory.
So I've got to get some solution to this, and:
Every time it crashes, just run the mongorestore command so it continues where it left off. It probably should work, but I feel a bit uneasy doing it.
Restore the database one collection at a time, but the largest collection is bigger than 5GB so it wouldn't work properly either.
Add swap or physical memory (temporarily) to the container so the process doesn't get killed after the process runs out of physical memory.
Something else, hopefully a better solution?
Increasing the swap size as the other answer pointed out worked out for me. Also, The --numParallelCollections option controls the number of collections mongodump/mongorestore should dump/restore in parallel. The default is 4 which may consume a lot of memory.
Since it sounds like you're not running out of disk space due to mongorestore continuing where it left off successfully, focusing on memory issues is the correct response. You're definitely running out of memory during the mongorestore process.
I would highly recommend going with the swap space, as this is the simplest, most reliable, least hacky, and arguably the most officially supported way to handle this problem.
Alternatively, if you're for some reason completely opposed to using swap space, you could temporarily use a node with a larger amount of memory, perform the mongorestore on this node, allow it to replicate, then take the node down and replace it with a node that has fewer resources allocated to it. This option should work, but could become quite difficult with larger data sets and is pretty overkill for something like this.
I solved the OOM problem by using the --wiredTigerCacheSizeGB parameter of mongod. Excerpt from my docker-compose.yaml below:
version: '3.6'
container_name: db
image: mongo:3.2
- ./vol/db/:/data/db
restart: always
# use 1.5GB for cache instead of the default (Total RAM - 1GB)/2:
command: mongod --wiredTigerCacheSizeGB 1.5
Just documenting here my experience in 2020 using mongodb 4.4:
I ran into this problem restoring a 5GB collection on a machine with 4GB mem. I added 4GB swap which seemed to work, I was no longer seeing the KILLED message.
However, a while later I noticed I was missing a lot of data! Turns out if mongorestore runs out of memory during the final step (at 100%) it will not show killed, BUT IT HASNT IMPORTED YOUR DATA.
You want to make sure you see this final line:
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
[########################] cranlike.files.chunks 5.00GB/5.00GB (100.0%)
restoring indexes for collection cranlike.files.chunks from metadata
finished restoring cranlike.files.chunks (23674 documents, 0 failures)
34632 document(s) restored successfully. 0 document(s) failed to restore.
In my case I needed 4GB mem + 8GB swap, to import 5GB GridFS collection.
Rather than starting up a new replica set, it's possible to do the entire expansion and upgrade without even going offline.
Start MongoDB 3.6 on host C
On the primary (currently A or B), add node C into the replica set
Node C will do an initial sync of the data; this may take some time
Once that is finished, take down node B; your replica set has two working nodes still (A and C) so will continue uninterrupted
Replace v3.4 on node B with v3.6 and start back up again
When node B is ready, take down node A
Replace v3.4 on node A with v3.6 and start back up again
You'll be left with the same replica set running as before, but now with three nodes all running v.3.4.
PS Be sure to check out the documentation on Upgrade a Replica Set to 3.6 before you start.
I ran into a similar issue running 3 nodes on a single machine (8GB RAM total) as part of testing a replicaset. The default storage cache size is .5 * (Total RAM - 1GB). The mongorestore caused each node to use the full cache size on restore and consume all available RAM.
I am using ansible to template this part of mongod.conf, but you can set your cacheSizeGB to any reasonable amount so multiple instances do not consume the RAM.
cacheSizeGB: {{ ansible_memtotal_mb / 1024 * 0.2 }}
My scenario is similar to #qwertz but to be able to upload all collection to my database the following script was created to handle partials uploads; uploading every single collection one-by-one instead of trying to send all database at once was the only way to properly populate it.
backupFiles=`ls ./backup/${DB_NAME}/*.bson.gz`
for file in $backupFiles
collection=(${file//./ })
collection=(${collection//// })
mongorestore \
$file \
--gzip \
--db=$DB_NAME \
--collection=$collection \
--drop \
FROM alpine:3.12.4
RUN [ "apk", "add", "--no-cache", "bash", "mongodb-tools" ]
context: .
dockerfile: Dockerfile.mongorestore
restart: on-failure
- ${PWD}/backup:/backup

mongodb operation very slow when using replica set

I'm running MongoDB 2.4.5 and recently I've started digging into Replica Set to get some kind of redundancy.
I started same mongo instance with --replSet parameter and also added an Arbiter to running Replica Set. What happened was writing to mongo slowed down significantly (from 15ms to 30-60ms, sometimes even around 300ms). As soon as I restarted it in not-replicaset-mode performance went back to normal.

Is mongod --repair still a blocking task in MongoDB 1.8?

I have a 5GB database I want to compact and repair. Unfortunately, I have an active application running on that database.
I'm wondering if running a mongod --repair task with MongoDB 1.8 will block all the other write operations on the database.
I don't want to shutdown the entire application for hours...
You may take a look at --journal key. It keeps binary log for last operations and recovery may take much less time than repair.
Yes, repairDatabase is a blocking operation, which means you'll need to do it during a scheduled maintenance window.
Alternately, if you are using a replica set, it's possible to repair with no down time by taking one member out of the replica set, repairing it, re-adding it to the replica set, and repeating until all are repaired. See the note in yellow at the end of this section for more info and caveats.