MongoDB Referenced Relationships vs DBRef - mongodb

I started learning MongoDB like a week ago and I am stuck on Relationships. More like I am confused.
I get when to use Embedded Relationships and when to use Referenced Relationships. I know Embedded Relationships got some drawbacks which is why we prefer Referenced Relationships over Embedded Relationships.
Now, I am learning DBRefs.
The thing is, I don't find it helpful in anyway. That's what I think. I hope I am wrong.
In DBrefs, we can reference documents from different collections in one document that's in a different collection.
In RefRels, we can reference different documents from different collections in one document that's in a different collection.
I mean, we can perform the same thing using DBrefs what we can perform using Referenced Relationships.
In Referenced Relationships, we create a field in a collection in a document and store ObjectIds of documents from different collections like so:
> db.Employee.insert({"Emp_Name":"Emp_1", "Emp_Address":[ObjectId("some_id_from_Address_collection"), ObjectId("some_id_from_Address_collection"), ObjectId("some_id_from_Address_collection")], "Emp_Phone":[ObjectId("some_id_from_Phone_collection"), ObjectId("some_id_from_Phone_collection"), ObjectId("some_id_from_Phone_collection")]})
In DBrefs, we create a field in a collection in a document and store values using ObjectIds just like we used to do in Referenced Relationships but in a different way. Consider following example:
> db.Employee.insert({"address": {"$ref": "address_home", "$id": ObjectId("534009e4d852427820000002"), "$db": "tutorialspoint"}, "name": "Tom Benzamin"})
We are still using ObjectId to store values in the Employee collection but the syntax is different because in this example, we are mentioning which DB and Collection to look in.
Why not just use Referenced Relationships and save time, instead of using this confusing and lengthy query and waste half of the time ?
Am I missing something ?
Should I even consider learning DBrefs ?

The point of DBRef is it allows referencing a database and a collection from a single field. Without it you would need two fields[*].
[*] Technically you could use a single field which is, for example, a Hash itself that contains database and collection reference, but then this is essentially the same thing as a DBRef but without the label.
IF you want to reference documents in other databases, DBRef could be useful. Its usefulness is limited by the fact that generally you still need to handle cross-db operations in your application, as drivers and server generally won't do this seamlessly for you. For example, DBRef is not directly usable with aggregation framework.
DBRef really just provides a more convenient way to store the database+collection name pair.
IF all of your collections are in the same database, you don't need DBRef at all (and in fact it would only get in the way).
Should I even consider learning DBrefs ?
They are very much a niche feature. Probably not.

Related

MongoDB and one-to-many relation

I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.

MongoDB - lookup/taxonomies

In my MongoDB database I have a number of document types that require look-up/taxonomy information. Typically I'm either holding an Id of the look-up or an Id and denormalising the look up info into the main document.
e.g.
task = {
TaskDetail : "Some task",
TaskPriority : { Id : xxxxx, Code : 'U', Description:'Urgent' }
}
Moving from traditional relational databases (RDBMS) where I would just have a TaskPriority table, I was wondering what the best practice is when using documents within mongo?
My initial thought was to have a taxonomy/look-up collection. Typically, look-up and enums are short, so each could be a separate document in the collection? Or I could mirror what you'd typically do in a RDBMS database and create a collection to look-up?
Can anybody point in me in the right direction?
Thanks in advance,
Sam
Referencing has advantages such as :
Better management of master data
De-duplication, hence lesser updates.
disadvantages :
look-up is a separate API call, no joins in MongoDB
$DBRef of $lookup functions can still be used, but cumbersome. Manual reference is easier.
Atomicity in document level, hence look-up in collections can get out of sync for sometime if the reference look-up collection is updated.
Embedding -
On the other hand embedding does not make sense for reference data, since in case the look-up value gets updated, you need to update all your documents. That is a huge exercise to keep a track of all the documents and their keys which needs reference data updates.
Secondly embedding of reference data, also can not achieve SCD (slowly changing dimension) history keeping ability. If the referencing is used instead, you can achieve this with a version date in the reference collection.
Considering these points, I am inclined to referencing. But my only doubt is if I do not have the referenced value in my parent documents, how will search work? When a user search with descriptive referenced values, how mongo will refer to the reference data collection?

MongoDB embedded documents vs. referencing by unique ObjectIds for a system user profile

I'd like to code a web app where most of the sections are dependent on the user profile (for example different to-do lists per person etc) and I'd love to use MongoDB. I was thinking of creating about 10 embedabble documents for the main profile document and keep everything related to one user inside his own document.
I don't see a clear way of using foreign keys for mongodb, the only way would be to create a field to_do_id with the type of ObjectId for example, but they would be totally unrelated internally, just happen to have the same Ids I'd have to query for.
Is there a limit on the number of embedded document types inside a top level document that could degrade performance?
How do you guys solve the issue of having a central profile document that most of the documents have to relate to in presenting a view per person?
Do you use semi foreign keys inside MongoDb and have fields with ObjectId types that would have some other document's unique Id instead of embedding them?
I cannot feel what approach should be taken when. Thank you very much!
There is no special limit with respect to performance. The larger the document, though, the longer it takes to transmit over the wire. The whole document is always retrieved.
I do it with references. You can choose between simple manual references and the database DBRef as per this page: http://www.mongodb.org/display/DOCS/Database+References
The link above documents how to have references in a document in a semi-foreign key way. The DBRef might be good for what you are trying to do, but the simple manual reference is very efficient.
I am not sure a general rule of thumb exists for which reference approach is best. Since I use Java or Groovy mostly, I like the fact that I get a DBRef object returned. I can check for this datatype and use that to decide how to handle the reference in a generic way.
So I tend to use a simple manual reference for references to different documents in the same collection, and a DBRef for references across collections.
I hope that helps.

MongoDB normalization, foreign key and joining

Before I dive really deep into MongoDB for days, I thought I'd ask a pretty basic question as to whether I should dive into it at all or not. I have basically no experience with nosql.
I did read a little about some of the benefits of document databases, and I think for this new application, they will be really great. It is always a hassle to do favourites, comments, etc. for many types of objects (lots of m-to-m relationships) and subclasses - it's kind of a pain to deal with.
I also have a structure that will be a pain to define in SQL because it's extremely nested and translates to a document a lot better than 15 different tables.
But I am confused about a few things.
Is it desirable to keep your database normalized still? I really don't want to be updating multiple records. Is that still how people approach the design of the database in MongoDB?
What happens when a user favourites a book and this selection is still stored in a user document, but then the book is deleted? How does the relationship get detached without foreign keys? Am I manually responsible for deleting all of the links myself?
What happens if a user favourited a book that no longer exists and I query it (some kind of join)? Do I have to do any fault-tolerance here?
MongoDB doesn't support server side foreign key relationships, normalization is also discouraged. You should embed your child object within parent objects if possible, this will increase performance and make foreign keys totally unnecessary. That said it is not always possible, so there is a special construct called DBRef which allows to reference objects in a different collection. This may be then not so speedy because DB has to make additional queries to read objects but allows for kind of foreign key reference.
Still you will have to handle your references manually. Only while looking up your DBRef you will see if it exists, the DB will not go through all the documents to look for the references and remove them if the target of the reference doesn't exist any more. But I think removing all the references after deleting the book would require a single query per collection, no more, so not that difficult really.
If your schema is more complex then probably you should choose a relational database and not nosql.
There is also a book about designing MongoDB databases: Document Design for MongoDB
UPDATE The book above is not available anymore, yet because of popularity of MongoDB there are quite a lot of others. I won't link them all, since such links are likely to change, a simple search on Amazon shows multiple pages so it shouldn't be a problem to find some.
See the MongoDB manual page for 'Manual references' and DBRefs for further specifics and examples
Above, #TomaaszStanczak states
MongoDB doesn't support server side foreign key relationships,
normalization is also discouraged. You should embed your child object
within parent objects if possible, this will increase performance and
make foreign keys totally unnecessary. That said it is not always
possible ...
Normalization is not discouraged by Mongo. To be clear, we are talking about two fundamentally different types of relationships two data entities can have. In one, one child entity is owned exclusively by a parent object. In this type of relationship the Mongo way is to embed.
In the other class of relationship two entities exist independently - have independent lifetimes and relationships. Mongo wishes that this type of relationship did not exist, and is frustratingly silent on precisely how to deal with it. Embedding is just not a solution. Normalization is not discouraged, or encouraged. Mongo just gives you two mechanisms to deal with it; Manual refs (analoguous to a key with the foreign key constraint binding two tables), and DBRef (a different, slightly more structured way of doing the same). In this use case SQL databases win.
The answers of both Tomasz and Francis contain good advice: that "normalization" is not discouraged by Mongo, but that you should first consider optimizing your database document design before creating "document references". DBRefs were mentioned by Tomasz, however as he alluded, are not a "magic bullet" and require additional processing to be useful.
What is now possible, as of MongoDB version 3.2, is to produce results equivalent to an SQL JOIN by using the $lookup aggregation pipeline stage operator. In this manner you can have a "normalized" document structure, but still be able to produce consolidated results. In order for this to work you need to create a unique key in the target collection that is hopefully both meaningful and unique. You can enforce uniqueness by creating a unique index on this field.
$lookup usage is pretty straightforward. Have a look at the documentation here: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#lookup-aggregation. Run the aggregate() method on the source collection (i.e. the "left" table). The from parameter is the target collection (i.e. the "right" table). The localField parameter would be the field in the source collection (i.e. the "foreign key"). The foreignField parameter would be the matching field in the target collection.
As far as orphaned documents, from your question I would presume you are thinking about a traditional RDBMS set of constraints, cascading deletes, etc. Again, as of MongoDB version 3.2, there is native support for document validation. Have a look at this StackOver article: How to apply constraints in MongoDB? Look at the second answer, from JohnnyHK
Packt Publishers have a bunch of good books on MongoDB. (Full Disclosure: I wrote a couple of them.)

MongoDB: Should you still provide IDs linking to other collections to or just include collections?

I'm pretty new to MongoDB and NoSQL in general. I have a collection Topics, where each topics can have many comments. Each comment will have metadata and whatnot making a Comments collection useful.
In MySQL I would use foreign keys to link to the Comments table, but in NoSQL should I just include the a Comments collection within the Topics collection or have it be in a separate collection and link by ids?
Thanks!
Matt
It depends.
It depends on how many of each of these type of objects you expect to have. Can you fit them all into a single MongoDB document for a given Topic? Probably not.
It depends on the relationships - do you have one-to-many or many-to-many relationships? If it's one-to-many and the number of related entities is small you might chose to put embed them in an IList on a document. If it's many-to-many you might chose to use a more traditional relationship or you might chose to embed both sides as ILists.
You can still model relationships in MongoDB with separate collections BUT there are no joins in the database so you have to do that in code. Loading a Topic and then loading the Comments for it might be just fine from a performance perspective.
Other tips:
With MongoDB you can index INTO arrays on documents. So don't think of an Index as just being an index on a simple field on a document (like SQL). You can use, say, a Tag collection on a Topic and index into the tags. (See http://www.mongodb.org/display/DOCS/Indexes#Indexes-Arrays)
When you retrieve or write data you can do a partial read and a partial write of any document. (see http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields)
And, finally, when you can't see how to get what you want using collections and indexes, you might be able to achieve it using map reduce. For example, to find all the tags currently in use sorted by their frequency of use you would map each Topic emitting the tags used in it, and then you would reduce that set to get the result you want. You might then store the result of that map reduce permanently and only up date it when you need to.
It's a fairly significant mind-shift from relational thinking but it's worth it if you need the scalability and flexibility a NOSQL approach brings.
Also look at the Schema Design docs (http://www.mongodb.org/display/DOCS/Schema+Design). There are also some videos/slides of several 10Gen presentations on schema design linked on the Mongo site. See http://www.mongodb.org/pages/viewpage.action?pageId=17137769 for an overview.