Anyone can explain about MongoDB GridFS feature? - mongodb

I am one of the MongoDB user.Unfortunately i heard about MongoDB GridFS feature but i am not sure When and where to use that feature or its necessary to use.If you might be using the feature then, i hope you guys can explain precisely about this feature.
Thanks,

You'd look into using GridFS if you needed to store large files in MongoDB.
Quote from the MongoDB documentation which I think sums it up well:
The database supports native storage
of binary data within BSON objects.
However, BSON objects in MongoDB are
limited to 4MB in size. The GridFS
spec provides a mechanism for
transparently dividing a large file
among multiple documents. This allows
us to efficiently store large objects,
and in the case of especially large
files, such as videos, permits range
operations (e.g., fetching only the
first N bytes of a file).
There's also the GridFS specification here

Related

How to store lookup values in MongoDB?

I have a collection in db which represents mediafiles.
And among other info I shoud store format name. I wonder if there best practices to store info like that. Is it better to create new collection for file formats and use link to that collection or to store format name right in file documents as a plain text? What about perfomance and compression? It supposed to be more than a billion documents in db. What would mongo expers suggest in this situation?
Embedded documents are the preferred approach.
In your case, it means it is better to store file format in the same collection.
Putting the file format into the separate collection means creating a new file on the disk.
It is a slower option and should be used if your document ( any of them ) exceeds 16 MB in size.
See these links for more information
6 Rules of Thumb for MongoDB Schema Design
and
How to Program with MongoDB Using the .NET Driver
I've done some benchmarks and figured out that in my case storing "lookup values" as plaintext is more efficient in terms of disk space than embedded document and than reference to outstanding collection. Sorry for poor terminology.

C/C++ Example for GridFS implementation in MongoDB

Just Started building an application on mongodb for file saving and retrieving and found that it has a standard specification for this purpose named as GridFS . But unfortunately i am unable to find any start up example for this in C/C++. If anyone know any thing related with it then please gives my the direction.
Edit:
I read that for storing file greater than the size of 16MB, GridFS is used, so what about the file size smaller than 16MB?..I can not get any information about it. For the smaller size, do i need to use some other process or the same GridFs?
Thanks
GridFS can be accessed through the class mongo::GridFS. The API is pretty self-explaining.
Alternatively, you can embed the binary data of your files in normal documents as the BSON BinData type. mongo::BSONObjBuilder has the method appendBinData to add a field with binary content to a document.
The reason GridFS exists is that there is an upper limit of 16MB per document. When you want to store data larger than 16MB, you need to split it into multiple documents. GridFS is an abstraction to handle this automatically, but it can also be used for smaller files.
In general, you shouldn't mix both techniques for the same content, as it just makes things more complicated with little benefit. When you can guarantee that your data doesn't get close to 16MB, use embedding. When you occasionally have content > 16MB, you should use GridFS even for files smaller than that.

Using MongoDB's for storing files of size est. 500KB

In GridFS FAQ there is said that one should store in aforementioned GridFS files of size >16MB. I have a lot of files ~500KB.
Question is: which approach is more efficient - storing files' content inside document or storing file itself in GridFS? Should I consider other approaches?
As for efficiency, either approach is the same. GridFS is implemented at the driver level by paging your >16MB data across multiple documents. MongoDB is unaware that you're storing a "file", it just knows how to store documents and doesn't ask questions.
So, depending on your driver (PHP/NodeJS/Ruby), you may find some metadata features nice and opt to use GridFS because of that. Otherwise, if you are absolutely sure a document will not be larger than 16MB, storing the raw content in the document should be fairly simple and just as fast (or faster).
Generally, I'd recommend against storing files in the database. It can have a negative impact on your working set and overall speed.

Storing Images: MongoDb vs File System

I need to store a large number of images (from around 10,000 images per day) With an average size of around 1 to 10 MB for each image.
I can store these images in MongoDB using the GridFS Library or by just storing the Base64 conversion as a string.
My question is that, is MongoDB suitable for such a payload or is it better to use a file system?
Many Thanks,
MongoDB GridFS has a lot of advantages over a normal file system and it is definitely able to cope with the amount of data you are describing as you can scale out with a sharded mongo cluster. I have not saved that much binary data in it on my own but I do not think there is a real difference between the binary data and text. So: yes, it is suitable for the payload.
Implementing queries and operations on the file objects being saved is easier with GridFS. In addition, GridFS caters for backup/replication/scaling. However, serving the files is faster using Filesystem + Nginx than GridFS + Nginx(look here: https://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/). GridFS 'slower' serving speed can however be leveraged when sharding is used

NoSql database suitable for long value

I am looking to use NoSql database for my applications. I have searched internet and found Berkeley DB, Mongodb, redis, Tokyo cabinet etc. There are some suggestions, usecases which database to use when. Some useful links i find are
http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
But i didn't find which database performs good when value(in key-value pair) is very big like 1 MB or something.
MongoDB looks good to me because of its query feature. How it performs when you store very big documents.
RavenDB has the notion of Attachemnts. In a document, instead of having a property 1MB in size (usually a byte array), you'd put a minimalistic document with data you want to Map/Reduce on and save that large data bite as an attachment. That speeds up things very well.