How to store lookup values in MongoDB? - mongodb

I have a collection in db which represents mediafiles.
And among other info I shoud store format name. I wonder if there best practices to store info like that. Is it better to create new collection for file formats and use link to that collection or to store format name right in file documents as a plain text? What about perfomance and compression? It supposed to be more than a billion documents in db. What would mongo expers suggest in this situation?

Embedded documents are the preferred approach.
In your case, it means it is better to store file format in the same collection.
Putting the file format into the separate collection means creating a new file on the disk.
It is a slower option and should be used if your document ( any of them ) exceeds 16 MB in size.
See these links for more information
6 Rules of Thumb for MongoDB Schema Design
and
How to Program with MongoDB Using the .NET Driver

I've done some benchmarks and figured out that in my case storing "lookup values" as plaintext is more efficient in terms of disk space than embedded document and than reference to outstanding collection. Sorry for poor terminology.

Related

Explain Like I'm Five: Form w/ Text and Image Field > Routes > Controller > Write to MongoDB Document - GridFS goes where?

I have been trying to read the documentation for GridFS and MongoDB for awhile. They just keep repeating the same thing and I can't make sense of it.
Desired Output: The user submits a form that form contains many fields, but one is an image. The request needs to store the data in a collection and make a new document, which can be retrieved later. My main question is how do I use GridFS in this situation to store an image in that document.
It says GridFS makes two collections in my database files and chunks. So how do those collections relate to my other collection which has the other form data?
I assume a reference to these files and chucks collection, however, I can't make any sense of this. It's been a few days and I feel like it's time to reach out to my StackOverflow community.
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
GridFS is mentioned everywhere and seems to be popular but I can't make sense of it. These moments of utter confusion usually result in a breakthrough, so I'm eager to learn from veterans and experts.
GridFS collections are internal implementation detail of GridFS. There is no linking between them and your other data.
To write to and read from GridFS, you would use GridFS APIs provided by your driver. Generally this means that, if you are saving for example some fields and a binary blob like an image, you would perform the save in two steps (one insert/update operation for the fields and a separate GridFS operation for the binary blob).
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
You wouldn't store the image in your document. You would store the image in GridFS, and in your document you could include a reference to the GridFS file (those have their own ids).

How to handle MongoDB documents with array larger than 16MB

There is a document with array, which size is more than 16 MB. How to store this document to be able to query some data from this array.
When you have documents which exceed the 16MB limit then you are very likely taking the denormalization approach of MongoDB too far and should consider to create another collection with one document for each array entry (or one document for each sensible grouping of array entries).
Another option is to treat the content as binary data and store it as a file in GridFS, but then you won't be able to do any meaningful queries on its content (only on the metadata you write for it separately).
The 16MB limit is hardcoded. You can not change it through configuration. There was a bugtracker ticket for that and it was closed as "Won't fix". But considering that MongoDB is open source, you could always change it in the sourcecode. Just keep the license conditions in mind when you do that.

Mongodb to Mongodb GridFS

I'm new to mongodb. I wanted to know if I initially code my app using mongodb and later I want to switch to mongodb gridfs, will the switching (of a filled large database) be possible.
So, if I am using mongo db initially and after some time of running the app the database documents exceed the size of 16Mb, I guess I will have to switch to gridfs. I want to know how easy or difficult will it be to switch to gridfs and whether that will be possible?
Thanks.
GridFS is used to store large files. It internally divides data in chunks(By default 255 KB). Let me give you an example of saving a pdf file in MongoDB using both ways. I am assuming the size of pdf as 10 MB so that we can see both normal way and GridFS way.
Normal Way:
Say you want to store it in normal_book collection in testDB database. So, whole pdf is stored in this collection and when you want to fetch it using db.normal_book.find(), whole pdf will be fetched in memory.
GridFS way:
In GridFS, we have two collections, one is for storing data and other is for storing its metadata. It will store data in fs.chunks collection and metadata in fs.filescollection. Now, the beauty of GridFS is that you can find the whole file at once or you can find chunks individually.
Now coming to your question, there is no direct way or property to
tell MongoDB that now I want to switch to GridFS. You need to
reinsert data in GridFS using mongofiles command-line tool or
using MongoDB's drivers.

Using MongoDB's for storing files of size est. 500KB

In GridFS FAQ there is said that one should store in aforementioned GridFS files of size >16MB. I have a lot of files ~500KB.
Question is: which approach is more efficient - storing files' content inside document or storing file itself in GridFS? Should I consider other approaches?
As for efficiency, either approach is the same. GridFS is implemented at the driver level by paging your >16MB data across multiple documents. MongoDB is unaware that you're storing a "file", it just knows how to store documents and doesn't ask questions.
So, depending on your driver (PHP/NodeJS/Ruby), you may find some metadata features nice and opt to use GridFS because of that. Otherwise, if you are absolutely sure a document will not be larger than 16MB, storing the raw content in the document should be fairly simple and just as fast (or faster).
Generally, I'd recommend against storing files in the database. It can have a negative impact on your working set and overall speed.

MongoDB GridFS Size Limit

I am using MongoDB as a convenient way of storing a dataset as a series of columns where there is a document that stores the values for a given column and another document that stores the details of the detaset, and a mapping to the other documents with the associated column values. The issue I'm now facing as things get bigger is that I can no longer store the entire column in a single document.
I'm aware that there is also the GridFS option, the only downside is that I believe it stores the files as blobs meaning I would lose random access to a chunk of the column, or the value at a specified index, something that was incredibly useful from the document store, however I may not ahve any other option.
So my question is: does GridFS also impose an upper limit on the size of documents and if so does anyone know what this is. I've looked in hte docs and haven't found anything, but it may be I'm not looking in the correct place or that there is a limit but it's not well documented.
Thanks,
Vackar
GridFS
Per the GridFS documentation:
Instead of storing a file in an single document, GridFS divides a file
into parts, or chunks, and stores each of those chunks as a separate
document. By default GridFS limits chunk size to 256k. GridFS uses
two collections to store files. One collection stores the file chunks,
and the other stores file metadata.
GridFS will allow you to store arbitrarily large files however this really won't help your use case. A file in GridFS will effectively be a large binary blob and you will not get any of the benefits of structured documents and indexing.
Schema Design
The fundamental challenge you have is your approach to schema design. If you are creating documents that are likely to grow beyond the 16Mb document limit, these will also have a significant impact on your database storage and fragmentation as the documents grow in size.
The appropriate solution would be to rethink your schema approach so that you do not have unbounded document growth. This probably means flattening the array of "columns" that you are growing so it is represented by a collection of documents rather than an array.
A better (and separate) question to ask would be how to refactor your schema given the expected data growth patterns.