Difference between Embedded Array of Ids and Normalized style in MongoDB - mongodb

So, I have been watching this video in order to learn MongoDB data modeling. In the one to many relationship, the speaker talks about three different kinds:
Embedded array / array keys: In a particular document you would have a field that would be an array that references other documents (for example, blog_posts attribute in the user document would store all the ids of the blog posts that the user has created)
Embedded tree: Rather than having an array with references to other things, we have documents in documents, completely embedded.
Normalized: Which you have two collections and references between each other.
So, what would be the difference between the embedded array keys and the normalized kind? Isn't the embedded array also doing references two another collection?

The difference is simple (and unfortunately a bit confusingly presented in that video).
Imagine modeling a blog post (Post) and comments (Comment).
Embedded array: the Post document contains an array of all of the IDs of all of the Comment documents. The Comment is stored in a separate document (and/or collection).
Tree: The Post document contains embedded Comments. They aren't stored in distinct documents or in their own collection. While this performs very well, the size limit of BSON documents being 16MB makes this potentially more difficult to work with.
Normalized: A Post document, and Comments are stored separately. The Comment document in this case however has a foreign-key like reference back to the Post. So, it might have a field called postId for example. It would reference the Post related to the Comment. This pattern is different from #1 as the Post document does not contain a list of Comments. So, while this option makes it so that the number of Comments is essentially unbounded/unlimited, it could make retrieval of comments more inefficient without specific indexes being built (like a postId, commentDate might be useful).

Related

MongoDB/Mongoose: many to many relationship

I have two Mongoose schemas Post and Tag and I want to design a many to many relationship between them. I'm wondering which one is the best solution for performances:
In both Tag and Post models keep an array with a reference to the models of the other schema (each Tag has many posts referenced in an array of ids and viceversa)
Keep the array of Tag ids only in the Post schema
The second solution seems easier to implement because when I edit the list of tags related to one post only one array has to be modified but at the same time might be less performant when getting all the posts that belongs to one tag
Keep the array of Tag ids only in the Post schema
I would definitely use this second solution which is more straightforward.
Unless you have exotic requirements, you shouldn't need to have each Tag explicitly tracking the Post references. A array of Post references in Tag documents would effectively be unbounded in size. This a usage pattern that tends to create storage fragmentation and/or performance issues for popular documents which frequently outgrow the record padding for their allocated record space.
On the other hand, the number of Tags used in a single Post is unlikely to change much over time and you can make this query performant by adding an index on the Tag array in your Post collection.

Grouping Related Types of Documents in MongoDB

Ok I am totally new to MongoDB and started reading MongoDB Definitive guide book.
I am on page 7 and it says: "Group Related types of Documents together"
I don't get it :( a document is something like
{"greetings" , "Hello World"}
so what do they mean by that sentence? could you please give me a more detailed example so I can picture it in my head. Thanks
A document is actually more like: {"greetings": "Hello World"}
In this case, its a very simple document consisting of only a key "greetings", and its value "Hello World". Some languages might call this an associative array, or a hash, or a dictionary. The point of this description is that MongoDB doesn't care what the structure is. Its scheme-free and does not require you to ensure that every document in a collection has the exact same structure. But for efficiency and organizational purposes, you would be inclined to store similar documents in a single collection. Thus, you will end up with very similar documents per collection from purely a design perspective.
A document in MongoDB is more or less like a JSON structure (bson to be specific). It can be of arbitrary depth and you create indexes on selected object levels to facilitate faster searching.
For the most part, just think of mongodb like a fancy object storage, with objects that represent whatever language with which you are familiar. The driver specific to your language will handle the bridge between your native object types, and the bson representation in mongodb. You create objects and store them. Its really not all that different than understanding mysql, but knowing that you don't have to define your table schema. Just start storing whatever you want.
Actually, a document would be more like
{ "Greeting":"Hello World" }
meaning the document contains a greeting (key) and the greeting is "Hello World" (value).
A document can be a very complex set of key value pairs, including values which are arrays, embedded documents, etc.
If you include a bunch of arbitrary documents in the same collection, it will be rather difficult to find them later. So the recommendation would be to group in a collection documents which have something in common.
A relatable example might be to have a collection which has each document representing a user of the system. The fields in each document will vary depending on how much information you have on each user, but at the very least you may have an email address, name, etc. You could then query for all users that satisfy some condition.
Another collection might have companies. You might have an array of users or user ids as one of the fields in a company document to represent every user who works at a company, or something like that.
I'm not familiar with the particular book you mention, but maybe if you keep plowing ahead it will start to make more sense. If not I recommend reading some of the pages here for more examples.

MondoDB - Why should you embed a document inside a document

Im looking to use MongoDB for my database implementation. Why would you want to embded a document insode a document?
It is one way to do what in a relational database you would do with a JOIN (something that you cannot do in MongoDB).
For example, you could have a MongoDB document as a blog post, and embed the list of comments right in there.
Then you can (for example):
load post and comments in a single query
search for posts which have replies
search for posts by user A which have replies by user B
atomically update both post and comments in a single transaction
All that would be impossible (or at least difficult) if the comments were stored in their own collection as separate documents.
In simple terms, embed if its NOT a top level object, if it does NOT have complex relationships, if there will be a lot of duplicate data if you do NOT embed, and if your documents become bigger then a few megabytes.
Taken from the MongoDB site: http://www.mongodb.org/display/DOCS/Schema+Design
Summary of Best Practices
Embed "First class" objects, that are at top level, typically have their own collection.
Line item detail objects typically are embedded.
Objects which follow an object modelling "contains" relationship should generally be embedded.
Many to many relationships are generally done by linking.
Collections with only a few objects may safely exist as separate collections, as the whole collection is quickly cached in application server memory.
Embedded objects are a bit harder to link to than "top level" objects in collections.
It is more difficult to get a system-level view for embedded objects. When needed an operation of this sort is performed by using MongoDB's map/reduce facility.
If the amount of data to embed is huge (many megabytes), you may reach the limit on size of a single object. See also GridFS.
If performance is an issue, embed.

MongoDB embedded documents vs. referencing by unique ObjectIds for a system user profile

I'd like to code a web app where most of the sections are dependent on the user profile (for example different to-do lists per person etc) and I'd love to use MongoDB. I was thinking of creating about 10 embedabble documents for the main profile document and keep everything related to one user inside his own document.
I don't see a clear way of using foreign keys for mongodb, the only way would be to create a field to_do_id with the type of ObjectId for example, but they would be totally unrelated internally, just happen to have the same Ids I'd have to query for.
Is there a limit on the number of embedded document types inside a top level document that could degrade performance?
How do you guys solve the issue of having a central profile document that most of the documents have to relate to in presenting a view per person?
Do you use semi foreign keys inside MongoDb and have fields with ObjectId types that would have some other document's unique Id instead of embedding them?
I cannot feel what approach should be taken when. Thank you very much!
There is no special limit with respect to performance. The larger the document, though, the longer it takes to transmit over the wire. The whole document is always retrieved.
I do it with references. You can choose between simple manual references and the database DBRef as per this page: http://www.mongodb.org/display/DOCS/Database+References
The link above documents how to have references in a document in a semi-foreign key way. The DBRef might be good for what you are trying to do, but the simple manual reference is very efficient.
I am not sure a general rule of thumb exists for which reference approach is best. Since I use Java or Groovy mostly, I like the fact that I get a DBRef object returned. I can check for this datatype and use that to decide how to handle the reference in a generic way.
So I tend to use a simple manual reference for references to different documents in the same collection, and a DBRef for references across collections.
I hope that helps.

How would you architect a blog using a document store (such as CouchDB, Redis, MongoDB, Riak, etc)

I'm slightly embarrassed to admit it, but I'm having trouble conceptualizing how to architect data in a non-relational world. Especially given that most document/KV stores have slightly different features.
I'd like to learn from a concrete example, but I haven't been able to find anyone discussing how you would architect, for example, a blog using CouchDB/Redis/MongoDB/Riak/etc.
There are a number of questions which I think are important:
Which bits of data should be denormalised (e.g. tags probably live with the document, but what about users)
How do you link between documents?
What's the best way to create aggregate views, especially ones which require sorting (such as a blog index)
First of all I think you would want to remove redis from the list as it is a key-value store instead of a document store. Riak is also a key-value store, but you it can be a document store with library like Ripple.
In brief, to model an application with document store is to figure out:
What data you would store in its own document and have another document relate to it. If that document is going to be used by many other documents, then it would make sense to model it in its own document. You also must consider about querying the documents. If you are going to query it often, it might be a good idea to store it in its own document as you would find it hard to query over embedded document.
For example, assuming you have multiple Blog instance, a Blog and Article should be in its own document eventhough an Article may be embedded inside Blog document.
Another example is User and Role. It makes make sense to have a separate document for these. In my case I often query over user and it would be easier if it is separated as its own document.
What data you would want to store (embed) inside another document. If that document only solely belongs to one document, then it 'might' be a good option to store it inside another document.
Comments sometimes would make more sense to be embedded inside another document
{ article : { comments : [{ content: 'yada yada', timestamp: '20/11/2010' }] } }
Another caveat you would want to consider is how big the size of the embedded document will be because in mongodb, the maximum size of embedded document is 5MB.
What data should be a plain Array. e.g:
Tags would make sense to be stored as an array. { article: { tags: ['news','bar'] } }
Or if you want to store multiple ids, i.e User with multiple roles { user: { role_ids: [1,2,3]}}
This is a brief overview about modelling with document store. Good luck.
Deciding which objects should be independent and which should be embedded as part of other objects is mostly a matter of balancing read/write performance/effort - If a child object is independent, updating it means changing only one document but when reading the parent object you have only ids and need additional queries to get the data. If the child object is embedded, all the data is right there when you read the parent document, but making a change requires finding all the documents that use that object.
Linking between documents isn't much different from SQL - you store an ID which is used to find the appropriate record. The key difference is that instead of filtering the child table to find records by parent id, you have a list of child ids in the parent document. For many-many relationships you would have a list of ids on both sides rather than a table in the middle.
Query capabilities vary a lot between platforms so there isn't a clear answer for how to approach this. However as a general rule you will usually be setting up views/indexes when the document is written rather than just storing the document and running ad-hoc queries later as you would with SQL.
Ryan Bates made a screencast a couple of weeks ago about mongoid and he uses the example of a blog application: http://railscasts.com/episodes/238-mongoid this might be a good place for you to get started.