As one can see in GridFS doc, BSON objects are limited in size. So if I want to store something extremely big, I need to separate it on chunks. It'll be a document in fs.files collection. My question is: is there a way to have huge fields in document. so that it can be found without looking in fs.files collection.
Thank you in advance!
No. BSON documents have a hard 16mb limit so individual fields can never exceed this size limitation. It is exactly that limitation GridFS is working around by transparently chunking a larger file amongst multiple smaller segments.
Related
There is a document with array, which size is more than 16 MB. How to store this document to be able to query some data from this array.
When you have documents which exceed the 16MB limit then you are very likely taking the denormalization approach of MongoDB too far and should consider to create another collection with one document for each array entry (or one document for each sensible grouping of array entries).
Another option is to treat the content as binary data and store it as a file in GridFS, but then you won't be able to do any meaningful queries on its content (only on the metadata you write for it separately).
The 16MB limit is hardcoded. You can not change it through configuration. There was a bugtracker ticket for that and it was closed as "Won't fix". But considering that MongoDB is open source, you could always change it in the sourcecode. Just keep the license conditions in mind when you do that.
Just Started building an application on mongodb for file saving and retrieving and found that it has a standard specification for this purpose named as GridFS . But unfortunately i am unable to find any start up example for this in C/C++. If anyone know any thing related with it then please gives my the direction.
Edit:
I read that for storing file greater than the size of 16MB, GridFS is used, so what about the file size smaller than 16MB?..I can not get any information about it. For the smaller size, do i need to use some other process or the same GridFs?
Thanks
GridFS can be accessed through the class mongo::GridFS. The API is pretty self-explaining.
Alternatively, you can embed the binary data of your files in normal documents as the BSON BinData type. mongo::BSONObjBuilder has the method appendBinData to add a field with binary content to a document.
The reason GridFS exists is that there is an upper limit of 16MB per document. When you want to store data larger than 16MB, you need to split it into multiple documents. GridFS is an abstraction to handle this automatically, but it can also be used for smaller files.
In general, you shouldn't mix both techniques for the same content, as it just makes things more complicated with little benefit. When you can guarantee that your data doesn't get close to 16MB, use embedding. When you occasionally have content > 16MB, you should use GridFS even for files smaller than that.
In GridFS FAQ there is said that one should store in aforementioned GridFS files of size >16MB. I have a lot of files ~500KB.
Question is: which approach is more efficient - storing files' content inside document or storing file itself in GridFS? Should I consider other approaches?
As for efficiency, either approach is the same. GridFS is implemented at the driver level by paging your >16MB data across multiple documents. MongoDB is unaware that you're storing a "file", it just knows how to store documents and doesn't ask questions.
So, depending on your driver (PHP/NodeJS/Ruby), you may find some metadata features nice and opt to use GridFS because of that. Otherwise, if you are absolutely sure a document will not be larger than 16MB, storing the raw content in the document should be fairly simple and just as fast (or faster).
Generally, I'd recommend against storing files in the database. It can have a negative impact on your working set and overall speed.
I am using MongoDB as a convenient way of storing a dataset as a series of columns where there is a document that stores the values for a given column and another document that stores the details of the detaset, and a mapping to the other documents with the associated column values. The issue I'm now facing as things get bigger is that I can no longer store the entire column in a single document.
I'm aware that there is also the GridFS option, the only downside is that I believe it stores the files as blobs meaning I would lose random access to a chunk of the column, or the value at a specified index, something that was incredibly useful from the document store, however I may not ahve any other option.
So my question is: does GridFS also impose an upper limit on the size of documents and if so does anyone know what this is. I've looked in hte docs and haven't found anything, but it may be I'm not looking in the correct place or that there is a limit but it's not well documented.
Thanks,
Vackar
GridFS
Per the GridFS documentation:
Instead of storing a file in an single document, GridFS divides a file
into parts, or chunks, and stores each of those chunks as a separate
document. By default GridFS limits chunk size to 256k. GridFS uses
two collections to store files. One collection stores the file chunks,
and the other stores file metadata.
GridFS will allow you to store arbitrarily large files however this really won't help your use case. A file in GridFS will effectively be a large binary blob and you will not get any of the benefits of structured documents and indexing.
Schema Design
The fundamental challenge you have is your approach to schema design. If you are creating documents that are likely to grow beyond the 16Mb document limit, these will also have a significant impact on your database storage and fragmentation as the documents grow in size.
The appropriate solution would be to rethink your schema approach so that you do not have unbounded document growth. This probably means flattening the array of "columns" that you are growing so it is represented by a collection of documents rather than an array.
A better (and separate) question to ask would be how to refactor your schema given the expected data growth patterns.
I want to store bson documents in gridfs because they grow rapidly over 16MB. But i also have to do some mapreduce analytics on them. Is that possible or do i have to split the document in multiple documents to do that. Tutorials and other stuff are always talking about binary data like pictures, videos and so on but not about bson documents
Thanks.
GridFS is only meant to store binary files. It is not meant to split normal documents (which you call BSON documents). If your BSON documents are too large, you need to rethink your data schema. If you provide that schema, I can update my answer with hints and tips.