Exceded maximum insert size of 16,000,000 bytes + mongoDB + ruby - mongodb

I have an application where I'm using mongodb as a database for storing record the ruby wrapper for mongodb I'm using is mongoid
Now everything was working fine until I hit a above error
Exceded maximum insert size of 16,000,000 bytes
Can any pin point how to get rid of errors.
I'm running a mongodb server which does not have a configuration file (no configuration was provide with mongodb source file)
Can anyone help

You have hit the maximum limit of a single document in MongoDB.
If you save large data files in MongoDB, use GridFs instead.
If your document has too many subdocuments, consider splitting it and use relations instead of nesting.

The limit of 16MB data per document is a very well known limitation.
Use GridFS for storing arbitrary binary data of arbitrary size + metadata.

Related

How to store lookup values in MongoDB?

I have a collection in db which represents mediafiles.
And among other info I shoud store format name. I wonder if there best practices to store info like that. Is it better to create new collection for file formats and use link to that collection or to store format name right in file documents as a plain text? What about perfomance and compression? It supposed to be more than a billion documents in db. What would mongo expers suggest in this situation?
Embedded documents are the preferred approach.
In your case, it means it is better to store file format in the same collection.
Putting the file format into the separate collection means creating a new file on the disk.
It is a slower option and should be used if your document ( any of them ) exceeds 16 MB in size.
See these links for more information
6 Rules of Thumb for MongoDB Schema Design
and
How to Program with MongoDB Using the .NET Driver
I've done some benchmarks and figured out that in my case storing "lookup values" as plaintext is more efficient in terms of disk space than embedded document and than reference to outstanding collection. Sorry for poor terminology.

Mongodb to Mongodb GridFS

I'm new to mongodb. I wanted to know if I initially code my app using mongodb and later I want to switch to mongodb gridfs, will the switching (of a filled large database) be possible.
So, if I am using mongo db initially and after some time of running the app the database documents exceed the size of 16Mb, I guess I will have to switch to gridfs. I want to know how easy or difficult will it be to switch to gridfs and whether that will be possible?
Thanks.
GridFS is used to store large files. It internally divides data in chunks(By default 255 KB). Let me give you an example of saving a pdf file in MongoDB using both ways. I am assuming the size of pdf as 10 MB so that we can see both normal way and GridFS way.
Normal Way:
Say you want to store it in normal_book collection in testDB database. So, whole pdf is stored in this collection and when you want to fetch it using db.normal_book.find(), whole pdf will be fetched in memory.
GridFS way:
In GridFS, we have two collections, one is for storing data and other is for storing its metadata. It will store data in fs.chunks collection and metadata in fs.filescollection. Now, the beauty of GridFS is that you can find the whole file at once or you can find chunks individually.
Now coming to your question, there is no direct way or property to
tell MongoDB that now I want to switch to GridFS. You need to
reinsert data in GridFS using mongofiles command-line tool or
using MongoDB's drivers.

How should I use MongoDB GridFS to store my big-size data?

After I read MongoDB Gridfs official document , I know that GridFS is used by MongoDB to store large file(size>16M),the file can be a video , a movie or anything else.But now , what I meet , is large strutured data , not a simple physical file. Size of the data exceeds the limit. To make it more detailed, what I am dealing with is thousands of gene sequences,and many of them exceeds BSON-document size limit .You can just consider each gene sequence as a simple string ,and the string is so large that some string has exceeds the mongoDB BSOM size limit.So ,what can I do to solve such a problem ? Is GridFS still suitable to solve my problem?
GridFS will split up the data in chunks of smaller size, that's how it overcomes the size limit. It's particularly useful for streaming data, because you can quickly access data at any given offset since the chunks are indexed.
Storing 'structured' data in the tens of megabytes sounds a bit weird: either you need to access parts of the data based on some criteria, then you need a different data structure that allows access to smaller parts of the data.
Or you really need to process the entire data set based on some criteria. In that case, you'll want an efficiently indexed collection that you can query based on your criteria and that contains the id of the file that must then be processed.
Without a concrete example of the problem, i.e. what does the query and the data structure look like, it will be hard to give you a more detailed answer.

Storing Images: MongoDb vs File System

I need to store a large number of images (from around 10,000 images per day) With an average size of around 1 to 10 MB for each image.
I can store these images in MongoDB using the GridFS Library or by just storing the Base64 conversion as a string.
My question is that, is MongoDB suitable for such a payload or is it better to use a file system?
Many Thanks,
MongoDB GridFS has a lot of advantages over a normal file system and it is definitely able to cope with the amount of data you are describing as you can scale out with a sharded mongo cluster. I have not saved that much binary data in it on my own but I do not think there is a real difference between the binary data and text. So: yes, it is suitable for the payload.
Implementing queries and operations on the file objects being saved is easier with GridFS. In addition, GridFS caters for backup/replication/scaling. However, serving the files is faster using Filesystem + Nginx than GridFS + Nginx(look here: https://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/). GridFS 'slower' serving speed can however be leveraged when sharding is used

Is it normal size for MongoDB?

I just import my Mysql database (size 2,8 Mio) in my new Mongo database with very simple php script i build, import was ok without error but when i look my Mongo database (with RockMongo) i can see this : Data Size 8.01m, Storage Size 13.7m.
MongoDB is bigger than Mysql for the same amount of data, is this normal ?
Thanks for your help and sorry for my english.
Yes, it's normal that the "same" data will take up more space in mongodb. There's a few things you need to take into account:
1) the document _id that's stored for each document (unless you are specifying your own value for that) is 12 bytes per doc
2) you're storing the key for each key-value pair in each document, whereas in MySql the column name is not stored for every single row so you have that extra overhead in your mongodb documents too. One way to reduce this is to use shortened key names ("column names") in your docs
3) mongodb automatically adds padding to allow documents to grow
In similar tests, loading data from SQL Server to MongoDB, with shortened 2 character document key names instead of the full names as per SQL Server, I see about 25-30% extra space being used in MongoDB.