C/C++ Example for GridFS implementation in MongoDB - mongodb

Just Started building an application on mongodb for file saving and retrieving and found that it has a standard specification for this purpose named as GridFS . But unfortunately i am unable to find any start up example for this in C/C++. If anyone know any thing related with it then please gives my the direction.
Edit:
I read that for storing file greater than the size of 16MB, GridFS is used, so what about the file size smaller than 16MB?..I can not get any information about it. For the smaller size, do i need to use some other process or the same GridFs?
Thanks

GridFS can be accessed through the class mongo::GridFS. The API is pretty self-explaining.
Alternatively, you can embed the binary data of your files in normal documents as the BSON BinData type. mongo::BSONObjBuilder has the method appendBinData to add a field with binary content to a document.
The reason GridFS exists is that there is an upper limit of 16MB per document. When you want to store data larger than 16MB, you need to split it into multiple documents. GridFS is an abstraction to handle this automatically, but it can also be used for smaller files.
In general, you shouldn't mix both techniques for the same content, as it just makes things more complicated with little benefit. When you can guarantee that your data doesn't get close to 16MB, use embedding. When you occasionally have content > 16MB, you should use GridFS even for files smaller than that.

Related

Using MongoDB's for storing files of size est. 500KB

In GridFS FAQ there is said that one should store in aforementioned GridFS files of size >16MB. I have a lot of files ~500KB.
Question is: which approach is more efficient - storing files' content inside document or storing file itself in GridFS? Should I consider other approaches?
As for efficiency, either approach is the same. GridFS is implemented at the driver level by paging your >16MB data across multiple documents. MongoDB is unaware that you're storing a "file", it just knows how to store documents and doesn't ask questions.
So, depending on your driver (PHP/NodeJS/Ruby), you may find some metadata features nice and opt to use GridFS because of that. Otherwise, if you are absolutely sure a document will not be larger than 16MB, storing the raw content in the document should be fairly simple and just as fast (or faster).
Generally, I'd recommend against storing files in the database. It can have a negative impact on your working set and overall speed.

MongoDB GridFS Size Limit

I am using MongoDB as a convenient way of storing a dataset as a series of columns where there is a document that stores the values for a given column and another document that stores the details of the detaset, and a mapping to the other documents with the associated column values. The issue I'm now facing as things get bigger is that I can no longer store the entire column in a single document.
I'm aware that there is also the GridFS option, the only downside is that I believe it stores the files as blobs meaning I would lose random access to a chunk of the column, or the value at a specified index, something that was incredibly useful from the document store, however I may not ahve any other option.
So my question is: does GridFS also impose an upper limit on the size of documents and if so does anyone know what this is. I've looked in hte docs and haven't found anything, but it may be I'm not looking in the correct place or that there is a limit but it's not well documented.
Thanks,
Vackar
GridFS
Per the GridFS documentation:
Instead of storing a file in an single document, GridFS divides a file
into parts, or chunks, and stores each of those chunks as a separate
document. By default GridFS limits chunk size to 256k. GridFS uses
two collections to store files. One collection stores the file chunks,
and the other stores file metadata.
GridFS will allow you to store arbitrarily large files however this really won't help your use case. A file in GridFS will effectively be a large binary blob and you will not get any of the benefits of structured documents and indexing.
Schema Design
The fundamental challenge you have is your approach to schema design. If you are creating documents that are likely to grow beyond the 16Mb document limit, these will also have a significant impact on your database storage and fragmentation as the documents grow in size.
The appropriate solution would be to rethink your schema approach so that you do not have unbounded document growth. This probably means flattening the array of "columns" that you are growing so it is represented by a collection of documents rather than an array.
A better (and separate) question to ask would be how to refactor your schema given the expected data growth patterns.

Should I use GridFS or binary data to store & retrieve images from MongoDB?

I was wondering which is better/faster:
Having a separate collection of documents that just contain the image saved as binary data, and possibly some metadata.
Or using GridFS to store the images.
If your images are small you can store them as binary data in the documents in your collection. Just consider that you will be retrieving them every time you query your document (unless you exclude the 'image' field from your queries).
However, if your images are larger I would use GridFS. GridFS has some features that make it very good at handling images that you should consider:
For larger images, when they are stored in GridFs they will be split in chunks and you can store very large files. If you try to store images in your document, you are constrained by the 16Mb max size of a document, and you are consuming space that needs to be used for your actual document.
You can add metadata to the image itself and run queries against these attributes, as if you were doing it from a regular document in a collection. So GridFS is as good as a document for metadata about the image.
I really like that I get MD5 hash calculated on the images. (It is very useful for some of my cases).
By storing images in GridFS you save yourself the preprocessing of the image into binary format (not a big deal, but a convenience of GridFS)
In terms of performance, reading/writing against a regular document should be no different than doing it against GridFS. I would not consider performance to be a differentiator in choosing either one.
My personal recommendation is to go with GridFS, but you need to analyze for your particular use case.
Hope this helps.
I use GridFS to store photos and documents. It's so easy and retrieving it from the collection to display or save locally is easy. You can store metadata along w/ the binary data inside the same collection. This way you don't need to create an additional collection to store them.
For example, in one of my project I store user profile photos along with usernames, file type, and date of upload.
GridFS is developed to handle Files in an efficient way.
http://www.mongodb.org/display/DOCS/When+to+use+GridFS
Do not forget that you maybe will have to translate the data
to a file and back.
But to be sure, do a performance test that takes account of your usage pattern.

question on gridfs

As one can see in GridFS doc, BSON objects are limited in size. So if I want to store something extremely big, I need to separate it on chunks. It'll be a document in fs.files collection. My question is: is there a way to have huge fields in document. so that it can be found without looking in fs.files collection.
Thank you in advance!
No. BSON documents have a hard 16mb limit so individual fields can never exceed this size limitation. It is exactly that limitation GridFS is working around by transparently chunking a larger file amongst multiple smaller segments.

Anyone can explain about MongoDB GridFS feature?

I am one of the MongoDB user.Unfortunately i heard about MongoDB GridFS feature but i am not sure When and where to use that feature or its necessary to use.If you might be using the feature then, i hope you guys can explain precisely about this feature.
Thanks,
You'd look into using GridFS if you needed to store large files in MongoDB.
Quote from the MongoDB documentation which I think sums it up well:
The database supports native storage
of binary data within BSON objects.
However, BSON objects in MongoDB are
limited to 4MB in size. The GridFS
spec provides a mechanism for
transparently dividing a large file
among multiple documents. This allows
us to efficiently store large objects,
and in the case of especially large
files, such as videos, permits range
operations (e.g., fetching only the
first N bytes of a file).
There's also the GridFS specification here