Mongodb- embedded vs Indexes - mongodb

My question is pretty simple. I am building my first application with mongodb. Up until now, i have always used sql. I have read a lot of information about embedding documents versus linked documents.
My question to the mongodb veterans is: Is there a huge difference in speed/performance if I used indexed links/queries apposed to embedded docs? If there is a huge difference can you please explain why? Thank you.
Again, i am new to mongodb and just don't want to get off on the wrong foot. thank you.

Yes, there is an enormous difference between references and embedded docs.
An embedded document is stored in the document in the same disk location as the rest of the doc's fields, so there's no additional network round-trips or disk seeks to retrieve the embedded document when you query the document as a whole.
DBRefs, on the other hand, are simply the _id of a document in another collection. It will take an additional roundtrip and additional disk seeks to get the "linked" document. See the spec for DBRefs here:
http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-DBRef
You should try to optimize your most common query by including in a single document all the info needed to satisfy that query.

Related

Mongodb Performance of getting a document in the case of non existing document

We are storing lots of data in mongodb let's say 30M docs. And these documents does not get modified very often. There are high number of read queries(~15k qps). And many of these queries(by _id field) will result in empty search result because of the nature of our use case.
I want to understand if mongodb does some sort of optimisation for detecting if a doc is not available in the db,index or not. Are there any plugin to enable this? Other option that I see is to use application level bloom filter but that would be another piece to maintain. AFAIK HBASE has support for bloom filter to see if a document is present or not.
Finding a non-existent document is the worst case of finding a document. Same as in real life, if what you're looking for doesn't exist you'll need more time to check all the places than if the thing existed at some point.
All of the find optimizations apply equally to finding documents that end up not existing (indexes, shard keys, etc.).

Explain Like I'm Five: Form w/ Text and Image Field > Routes > Controller > Write to MongoDB Document - GridFS goes where?

I have been trying to read the documentation for GridFS and MongoDB for awhile. They just keep repeating the same thing and I can't make sense of it.
Desired Output: The user submits a form that form contains many fields, but one is an image. The request needs to store the data in a collection and make a new document, which can be retrieved later. My main question is how do I use GridFS in this situation to store an image in that document.
It says GridFS makes two collections in my database files and chunks. So how do those collections relate to my other collection which has the other form data?
I assume a reference to these files and chucks collection, however, I can't make any sense of this. It's been a few days and I feel like it's time to reach out to my StackOverflow community.
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
GridFS is mentioned everywhere and seems to be popular but I can't make sense of it. These moments of utter confusion usually result in a breakthrough, so I'm eager to learn from veterans and experts.
GridFS collections are internal implementation detail of GridFS. There is no linking between them and your other data.
To write to and read from GridFS, you would use GridFS APIs provided by your driver. Generally this means that, if you are saving for example some fields and a binary blob like an image, you would perform the save in two steps (one insert/update operation for the fields and a separate GridFS operation for the binary blob).
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
You wouldn't store the image in your document. You would store the image in GridFS, and in your document you could include a reference to the GridFS file (those have their own ids).

MongoDB - Many-to-Many-Relationship (Special Case)

Should I store an embedded document multiple times in MongoDB or should I only store it once and link to it using it‘s ID?
I want to accomplish a „Many-to-Many-Relationship“ and I only have to update these embedded documents once a year.
Which of the both option fits better?
Thanks for your help!
In your case, you only have to update the embedded documents one a year, it means that the read operation is going to be used much more than the write operation.
So, for optimizing read operations, "references" should be avoid.
The only remaining concern here is whether the embedded documents are large (size) or not and they are frequently duplicated or not. If not, feel free to use embedded documents, because that is the natural power of MongoDB.

How to use collections in Mongo

I have 1 collection called Visit, in it I save documents with information about visit referrer, page, keyword, dates, so on.
I think Keyword can be considered a collection on it's own, the same for Page.
This will force me to create different collections but I'm not sure if this is the right way to go.
In a traditional DB model, they will clearly be stored in separate tables connected with FK.
But what about mongo ?
Is it a good practice for keys to have the same value over and over again for different documents and just create a collection in this case ?
One of the benefits of MongoDB is its ability to embed documents.
It is perfectly reasonable for the documents in your Visits collection to contain Keyword and Page Sub-Documents.
The rule of thumb is embed documents for speed, normalize documents for consistency.
If you embed the Keyword and Page documents in your Visit document, your application will only have to make one query to retrieve all of the relevant information. (speed)
However, the drawback is that if the Keyword and/or Page information is updated, it will have to be updated in every other Visit document where it appears. If many different Visit documents will rely on the same Keyword and Page documents, it may be better to keep them in a separate collection, especially if they will be changed frequently. (consistency)
This is of course a generalization, and ultimately it is up to you, the application developer to decide which works best for your unique situation. There is additional information on Embedding versus Linking in the Mongo Document titled "Schema Design"
http://www.mongodb.org/display/DOCS/Schema+Design
You may also find the article "MongoDB Data Modeling and Rails" to be beneficial:
http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails
The example is given in Rails, but the theory on Document design applies to any language.
Good luck!

"Pointers" in MongoDB?

In the project I am currently working on, it seems to make more sense efficiency wise if I create a nested document that contains a list of "pointers" to information stored in other collections. That way this nested document can be easily used to retrieve a list of relevant information. The question is, how to do this? Is there a way to store locations of other information in a field in MongoDB? If not, could anyone suggest a scheme that is equally or more efficient? Thanks very much!
There is no GOOD way to do this. If this is what you're looking for, you should be using a relational database.
But if you HAVE to go by this route then, why not store ID's in a document, and then link those ID's to documents in the other collection.
Unfortunately, this would require you to do 2 separate queries, as Mongo does not support compound queries that span documents.