Incredibly low GridFS performance using MongoDB 3.0.0 and Mongofiles

Incredibly low GridFS performance using MongoDB 3.0.0 and Mongofiles - mongodb

I have a MongoDB database with a GridFS collection containing hundreds of thousands of files (345,073, to be precise -- and about 100GBs in volume).
On MongoDB 2.6.8 it takes a fraction of a second to list the files using the native mongofiles and connecting to mongod. This is the command I use:
mongofiles --db files list
I just brewed and linked MongoDB 3.0.0 and suddenly the same operation takes more than five minutes to complete, if ever it does. I have to kill the query most of the time, as it drives two of my CPU cores to 100%. The log file does not show anything irregular. I rebuilt the indexes to no avail. I also tried the same with my other GridFS collections in other databases, each with millions of files and I encounter the same issue.
Then I uninstalled 3.0.0 and relinked 2.6.8 and everything is back to normal (using the exact same data files).
I am running MongoDB on Yosemite, and I reckon the problem might be platform specific. But is there anything that I have ommited and I should take into consideration? Or have I really discovered a bug that I must report?

Having the same problem here, for me running a mongofiles 2.6 from a docker image fixed the problem, seems they broke something with the rewrite.

Related

Mongodb 3.0.15 on Ubuntu 14.04 - data disappeared - possible causes?

I have been logging bitcoin order books and trades for the past 20 days in mongodb. 1 database [bitmicro], 1 collection for trades, 1 collection for books.
now suddently all the data is gone and the logger has started from scratch since yesterday.
The mongo log doesnt show any entries since October. And a new database appeared called Warning exactly when the data loss happened
> show dbs
Warning 0.078GB
bitmicro 0.453GB
local 0.078GB
After some reading it might be the case, that the filesize of the collection became too big and ubuntu deleted it?
Since the log /var/log/mongodb/mongod.log doesnt show any entries since starting the server, how can I find out what happened?

the existence of the Warning database implies that your database was accessed by a malicious 3rd party, and your database is exposed to the internet without any authentication.
https://docs.mongodb.com/manual/administration/security-checklist/

mongorestore for a collection results in "Killed" output and collection isn't fully restored

I type the following below:
root#:/home/deploy# mongorestore --db=dbname --collection=collectionname pathtobackupfolder/collectionname.bson
Here's the output:
2016-07-16T00:08:03.513-0400 checking for collection data in pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.525-0400 reading metadata file from pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.526-0400 restoring collectionname from file pathtobackupfolder/collectionname.bson
Killed
What's going on? I can't find anything on Google or Stackoverflow about a mongorestore resulting in "Killed". The backup folder that I'm restoring from is a collection of 12875 documents, and yet everytime I run the mongorestore, it always says "Killed", and always restores a different number that is less than the total number: 4793, 2000, 4000, etc.
The machine that I'm performing this call on is "Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-71-generic x86_64)" from Digital Ocean
Any help is appreciated. Thanks.

After trying the mongorestore command for the 5th and 6th time after posting this question, this time more explicit output came out that indicated it was a memory issue specific to Digital Ocean. I followed https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04 and the restore finished completely without errors.

If you are trying to solve it in docker, just increase swap memory in settings.json file

MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

We have a three-server replicaset running MongoDB 2.2 on Ubuntu 10.04, and recently had to upgrade the hard drive for each server where one particular database resides. This database contains log information for web service requests, where they write to collections in hourly buckets using the current timestamp to determine the name, e.g. log_yyyymmddhh.
I performed this process:
backup the database on the primary server with mongodump --db log_db
take a secondary server offline, replace the disk
bring the secondary server up in standalone mode (i.e. comment out the replSet entry
in /etc/mongodb.conf before starting the service)
restore the database on the secondary server with mongorestore --drop --db log_db
add the secondary server back into the replicaset and bring it online,
letting replication catch up the hourly buckets that were updated/created
while it had been offline
Everything seemed to go as expected, except that the collection which was the current bucket at the time of the backup was not brought up to date by replication. I had to manually copy that collection over by hand to get it up to date. Note that collections which were created after the backup were synched just fine.
What did I miss in this process that caused MongoDB not to get things back in synch for that one collection? I assume something got out of whack with regard to the oplog?
Edit 1:
The oplog on the primary showed that its earliest timestamp went back a couple of days, so there should have been plenty of space to maintain transactions for a few hours (which was the time the secondary was offline).
Edit 2:
Our MongoDB installation uses two disk partitions: /dev/sda1 and /dev/sdb1. The primary MongoDB directory /var/lib/mongodb/ is on /dev/sda1, and holds several databases, while the log database resides by itself on /dev/sdb1. There's a sym link /var/lib/mongodb/log_db which points to a directory on /dev/sdb1. Since the log db was getting full, we needed to upgrade the disk for /dev/sdb1.

You should be using mongodump with the --oplog option. Running a full database backup with mongodump on a replicaset that is updating collections at the same time may not leave you with a consistent backup. This becomes worse with larger databases, more collections and more frequent updates/inserts/deletes.
From the documentation for your version (2.2) of MongoDB (it's the same for 2.6 but just to be as accurate as possible):
--oplog
Use this option to ensure that mongodump creates a dump of the
database that includes an oplog, to create a point-in-time snapshot of
the state of a mongod instance. To restore to a specific point-in-time
backup, use the output created with this option in conjunction with
mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump
operation, the dump will not reflect a single moment in time. Changes
made to the database during the update process can affect the output
of the backup.
http://docs.mongodb.org/v2.2/reference/mongodump/
This is not covered well in most MongoDB tutorials around backups and restores. Generally you are better off if you can perform a live snapshot of the storage volume your database resides on (assuming your storage solution has a live snapshot ability compatible with MongoDB). Failing that, your next best bet is taking a secondary offline and then performing a snapshot or backup of the database files. Mongodump on a live database is increasingly a less optimal solution for larger databases due to performance issues.
I'd definitely take a look at the MongoDB overview of backup options: http://docs.mongodb.org/manual/core/backups/

I would guess this has to do with the oplog not being long enough, although it seems like you checked that and it looked reasonably big.
Still, when adding new members to a replica set you shouldn't be snapshotting and restoring them. It's better to simply add a new member and let replication happen by itself. This is described in the Mongo docs and is the process I've always followed.

MongoDB 2Gb limit - can't compact database

I have been adding files to GridFS in my 32bit Mongo database. It eventually failed when the size of all Mongo files hit 2Gb. So, I then deleted the files in GridFS. I've tried running the repairDatabase() command, but it fails, saying "mongo requires 64bit for larger datasets". I get the same error trying to run the compact command against GridFS.
So, I've hit the 2Gb limit, but it won't let me compact or repair because it doesn't have space. Talk about Catch22!!
What do I do?
Edit
This is an immediate problem I have - how do I compact the database right now?

I think the only recourse is to upgrade to a 64-bit OS.

I had the same problem on my database and I solved it such way. At first I created Amazon EC2 64-bit instance and moved database files from 32-bit instance via plain copy. Then I made all needed cleanups in database on 64-bit instance and made dump with mongodump. This dump I moved back to 32-bit instance and restored database from it.
If you need to restore database with same name, that you had before, you can just rename your old db-files in dbpath (files have database name in their name)
And of course, you should move to upgrade to 64-bit later. MongoDB on 32-bit OS is very bad in support.

shot in the dark here... you could try opening a slave off the master (in 64 bit) and see if you can force a replication over to the slave, essentially backing up your data. I have no idea if this would actually work, as it's pretty clear that 32bit has a 2gig limit (all their docs warn about this :( ), but thought I'd at least post a somewhat potentially creative solution..

Problem while exporting mongoDB Collection

I am using MongoDB version 1.6.5
One of my collection has 973525 records.
when I try to export this collection mongodb exports only 101 records.
I can't figure out the problem .
any one knows the solution for it.

This sounds like corruption. If your server has not shutdown cleanly that could be the cause. Have you had system crashes where you didn't do a repair?
You can try to do a dump with mongodump --dbpath if you shut down the server first.
Note: MongoExport/Import will not be able to restore all the data since json can't represent all possible data types.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse