DB consuming large space

DB consuming large space - mongodb

i am newbie to mongoDB ,as i start working with test application (ASP.Net) found that the db consuming large disk space.I was wondering that collections have only small piece of data like a word.So does anybody can shed some light on this ?
Please correct me if am wrong.Thanks in advance

Mongo doesn't shrink previously allocated structures because that would slow database down. When you need it, run the repair procedure to rebuild DB and reclaim unused space. On live project you should schedule it to off-peak hours.
From the command line:
mongod --repair
From the shell (you have to do for all dbs including local if you go this route):
db.repairDatabase();

Related

One database at MongoDB is not responding

I have several databases under the mongodb instance (v 4.0.26) in a Ubuntu 18.04 server.
One of the database has started behaving inconsistently in terms of connectivity all of a sudden.
I have checked the resource consumption on the server. 7GB of RAM left and 34GB of storage also available.
In Mongo Shell (at the server), when I connect to the particular database and perform db.getCollectionNames() it just hangs. This behavior is also not consistent. But every other database in that instance works without any problem.
I am suspecting there could be a corrupt document in any of the collection which is resulting this. Looking for guidance to debug this issue.
P.S.: Losing the data in db might cost my job.

mongorestore failing on shard cluster v4.2.x | Error: "panic: close of closed channel"

Scenario:
I had a standalone MongoDB Server v3.4.x where I had several DBs and collections respectively. As the plan was to upgrade to lastest 4.2.x, I have created a mongo dump of all DBs.
Created a shard cluster of config server (replica cluster), shard-1 server (replica cluster) & shard-2 server (cluster) [MongoDB v4.2.x]
Issue:
Now when I try to restore the dump, it's partially restoring every time I try to restore DBs. If I try to restore single DB it fails with same error. But whenever I try to restore specific DB & specific collection it always works fine. But the problem is so many collections across many DBs. Cannot do it for all indicvidually & every time it fails at different progress percentage/collection/DBs.
Error:
2020-02-07719:07:03.822+0000 [#####################...] myproduct_new.chats 68.1MB/74.8MB (91.0%)
2020-02-07719:07:03.851+0000 [########## ] myproduct_new.metaCrashes 216MB/502MB (42.9%)
2020-02-07719:07:03.876+0000 [################## ] myproduct_new.feeds 152MB/196MB (77.4%)
panic: close of closed channel
goroutine 25 [running]: github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreCollectionToDB(Oxc0001a0000, 0xc000234540, Oxc, 0xc00023454d, 900, Ox7fa5503e21f0, 0xc00020b890, 0x1f66e326, Ox0, ...)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntent(Oxc0001a0000, Oxc00022f9e0, Ox0, Ox0, Ox0, Ox0)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntents.funcl(Oxc0001a0000, 0xc000146420, 0x3)
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. created by github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntents
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. ubuntu#ip-00-xxx-xxx-00:/usr/local/backups/Dev_backup_07-02-2020$ Ox10, Oxc00000f go:503 +0x49b go:311 +Oxbe9 go:126 +Oxlcb go:109
+0x12d
Question:
I am connecting to mongos and trying to restore. Currently, sharding is not yet enabled for any DB. Can anyone put some light on whats going wrong or how to restore the dump?

I have got the same problem, then I found out that it is the problem of my mongodb replica set caused this error.
check the rs.status() of your database.
if you got the message
Our replica set config is invalid or we are not a member of it
try this answer

We faced exact same issue for same spec trying to restore from Mongodump. There is no definite reason but could be best to check below factors
Check your Dump size(bjson) vs Allocated Disk free space on Cluster. Dump size could be 2x to 3x of our core Mongo data folder size(Which is compressed on top of BJSON)
Check your Oplog size configured during cluster creation, for first time migration provide 10-15% of free diskspace size as Oplog size, you can change this after migration. This will help secondaries to lag bit longer and catch up on sync faster from WAL. eg: Allocated 3.5 GB for oplog out of Total 75 GB Hardisk size, with 45GB of Total data(compressed). In real world usage scenario(post migration), keep it 1-2 hour write data volumes as oplog size.
Now your total disk space would be Dump folder size + Oplog + 6GB(Default mongo installation + system extras).
Note: If you cannot afford to allocate Dump folder size, you have to run the restore in batches(DBs or Collections nsInclude option), giving time for Mongo to compress after importing bjson. This should be done in minutes
After your restore, Mongo will shrink the data size and diskspace will more or less match close to your Standalone data folder size.
If this is not set and your diskspace is under provisioned during Migration, while your migration is on, Mongo will try to increase diskspace and it cannot do when Primary is used, tries to increase in secondary and make current primary to secondary which could potentially cause above error. You can also check the Hardware/Status Vitals to see whether the servers changed state from Primary to Secondary
Also try NOT to enable Server Auto upsizing while creating cluster, I don't have definite rationale, but we don't want any actions to happen in background to upgrade hardware say M30 to M40 because CPU is busy in middle of Migration(it happened to me)
Finally as good practice, try to run Large Databases mainly with Large single Collection > 4 GB(Non-Shard) separately. I had 40+ dbs, with 20% > 15GB in dump BJSON size and having 1 or 2 big collections with >4GB multi-million docs. Once I separated them, it gave breathing space for Mongo to bulk insert and compress them with some elapsed time of few mins. Mongo restore happens at collection level
So instead of taking 40-50 mins to restore it took 90-120 mins after some mock practices and order.
If you have time to plan, try this out. Any other learning please share
Key takeaways - Check your Dump Folder size and large collection size.
RATE OF DISK WRITES, OPLOG Lag, RAM, CPU, IO are good KPIs to keep watch on during Dumprestore

MongoDB reducing db filesize

I have a replica-set.
And I run out of disk space on my secondary instances.
There is no space on disk to run db.repairDatabase()
Is there any other way to free some disk space?
I was thinking:
bring secondary down
Delete all data
run db.repairDatabase() if deleting data will allow it
Bring it back up.
WIll this work?
UPDATE
Worth to mention that I can't currently SSH to servers. Only using mongo client now.

No that won't work - there has to be a database there to run db.repairDatabase() on. However, what works just as well is to bring the secondary down, delete the database files and then bring it back it up. This will force a re-sync with the primary which will in effect do the same thing as a db.repairDatabase() as it will recreate the data files from scratch.
However, in order to delete the datafiles you'll need to ssh in to the instance. If you cannot ssh in you have fairly significant issues that will interfere with any attempt to recover the secondary.

Why and what case to issue mongodb repair command

I am using Mongodb 2.4.8 on a 64 bit machiene with 3 servers as replicaSet, for which i have currently disbaled journaling on my development box .
Durabilty is not so important for our Application , so the reason i have disabled Journaling Option .
I see that there is only one advantage of journaling , that is in case of an unclean shutdown we dont have to issue a repair command as journaling will take care of it .
To produce this unclean shutdown i killed mongo replica process using kill -9 Mongo process Id , i just removed mongo locks and restarted the mongo primary , secondary and the arbitery servers , everything started fine .
My question is that , when i should we issue the repair command actually (as removing locks and restart works )
Please excuse if the question is too dumb , as i wanted to know the risk of disbaling journaling under production .

The repairDatabase command checks your whole database for corrupted data and discards that data so the rest becomes usable again.
This can become necessary after an unclear shutdown. In your case the shutdown didn't appear to corrupt any data (or maybe it did, but it didn't become apparent yet because the data in question wasn't accessed yet). But that doesn't mean that this will always be the case. Was your database actually doing anything at that moment? When the database is idle or only performing read-operations, there is usually not much to worry about. But when it is currently in the middle of a large write-operation, a sudden shutdown without journaling can be much more troublesome.
Another scenario where a database could be corrupted and repairDatabase could help is a physical malfunction of the storage medium or a corruption of the underlying filesystem.
Important note regarding replica-sets: When you have a replica-set, and only one node is corrupted, then you should rather remove that node and rebuild it from the other members of the replica-set. RepairDatabase will destroy any corrupted data. Restoring from a replica-set will not.

Mongodb EC2 EBS Backups

I have confusion on what I need to do here. I am new to Mongo. I have set up a small Mongo server on Amazon EC2, with EBS volumes, one for data, one for logs. I need to do a backup. It's okay to take the DB down in the middle of the night, at least currently.
Using the boto library, EBS snapshots and python to do the backup, I built a simple script that does the following:
sudo service mongodb stop
run backup of data
run backup of logs
sudo service mongodb start
The script ran through and restarted, but I noted in the AWS console that the snapshots are still being created, even through boto has come back, but Mongo has restarted. Certainly not ideal.
I checked the Mongo docs, and found this explanation on what to do for backups:
http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/#ec2-backup-database-files
This is good info, but a bit unclear. If you are using journaling, which we are, it says:
If the dbpath is mapped to a single EBS volume then proceed to Backup the Database Files.
We have a single volume for data. So, I'm assuming that means to bypass the steps on flushing and locking. But at the end of Backup the Database Files, it discusses removing the locks.
So, I'm a bit confused. As I read it originally, then I don't actually need to do anything - I can just run the backup, and not worry about flushing/locking period. I probably don't need to take the DB down. But the paranoid part of me says no, that sounds suspicious.
Any thoughts from anyone on this, or experience, or good old fashioned knowledge?

Since you are using journaling, you can just run the snapshot without taking the DB down. This will be fine as long as the journal files are on the same EBS volume, which they would be unless you symlink them elsewhere.
We run a lot of mongodb servers on Amazon and this is how we do it too.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse