The MongoDB docs for DBREFs say:
Unless you have a compelling reason to use DBRefs, use manual references instead.
Why? DBREFs seem more easy to use, since they encode the database and collection names, which would lead to less hard-coding in the application. Plus, DBREF is a standard format that many drivers understand.
This question is related, but not exactly the same:
MongoDB - is DBREF necessary?
The answer to that question is that embedding/denormalization is preferable to linking, but it doesn't answer the question of why manual linking is preferable to DBREFs.
Here a conclusion of all I viewed.
Using DBRef is not a join operation, it will automatically query the second or more times, depends on how much DBRef you have got in this collection fields.
Assuming you have a collection that its model has 10 DBRef, you make query for 10 elements' list of it and one of these DBRef is really needed. Once you query, Mongodb will runs 101(1 + 10*10) queries, automatically, no matter you need these DBRef or not. If you query these field manually, just a few coding and only 11(1 + 1*10) queries are needed.
So, what do you say?
Related
I started learning MongoDB like a week ago and I am stuck on Relationships. More like I am confused.
I get when to use Embedded Relationships and when to use Referenced Relationships. I know Embedded Relationships got some drawbacks which is why we prefer Referenced Relationships over Embedded Relationships.
Now, I am learning DBRefs.
The thing is, I don't find it helpful in anyway. That's what I think. I hope I am wrong.
In DBrefs, we can reference documents from different collections in one document that's in a different collection.
In RefRels, we can reference different documents from different collections in one document that's in a different collection.
I mean, we can perform the same thing using DBrefs what we can perform using Referenced Relationships.
In Referenced Relationships, we create a field in a collection in a document and store ObjectIds of documents from different collections like so:
> db.Employee.insert({"Emp_Name":"Emp_1", "Emp_Address":[ObjectId("some_id_from_Address_collection"), ObjectId("some_id_from_Address_collection"), ObjectId("some_id_from_Address_collection")], "Emp_Phone":[ObjectId("some_id_from_Phone_collection"), ObjectId("some_id_from_Phone_collection"), ObjectId("some_id_from_Phone_collection")]})
In DBrefs, we create a field in a collection in a document and store values using ObjectIds just like we used to do in Referenced Relationships but in a different way. Consider following example:
> db.Employee.insert({"address": {"$ref": "address_home", "$id": ObjectId("534009e4d852427820000002"), "$db": "tutorialspoint"}, "name": "Tom Benzamin"})
We are still using ObjectId to store values in the Employee collection but the syntax is different because in this example, we are mentioning which DB and Collection to look in.
Why not just use Referenced Relationships and save time, instead of using this confusing and lengthy query and waste half of the time ?
Am I missing something ?
Should I even consider learning DBrefs ?
The point of DBRef is it allows referencing a database and a collection from a single field. Without it you would need two fields[*].
[*] Technically you could use a single field which is, for example, a Hash itself that contains database and collection reference, but then this is essentially the same thing as a DBRef but without the label.
IF you want to reference documents in other databases, DBRef could be useful. Its usefulness is limited by the fact that generally you still need to handle cross-db operations in your application, as drivers and server generally won't do this seamlessly for you. For example, DBRef is not directly usable with aggregation framework.
DBRef really just provides a more convenient way to store the database+collection name pair.
IF all of your collections are in the same database, you don't need DBRef at all (and in fact it would only get in the way).
Should I even consider learning DBrefs ?
They are very much a niche feature. Probably not.
With the new aggregation pipeline stage $lookup we are now able to perform 'left outer joins'.
At first glance, I want to immediately replace one of our denormalised collections with two separate collections and use the $lookup to join them upon querying. This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.
But surely this is too good to be true? This is a NoSQL, document database after all!
MongoDB's CTO also highlights his concerns:
We’re still concerned that $lookup can be misused to treat MongoDB
like a relational database. But instead of limiting its availability,
we’re going to help developers know when its use is appropriate, and
when it’s an anti-pattern. In the coming months, we will go beyond the
existing documentation to provide clear, strong guidance in this area.
What are the limitations of $lookup? Can I use them in real-time, operational querying of our data or should they be left for reporting, offline situations?
I share your same enthusiasm for $lookup.
I think there are trade-offs. One of the major concerns of SQL databases (and which is one of the reasons for the genesis of NoSQL) is that at large scale, joins can take a lot of time (well, relatively speaking).
It definitely helps in giving you a declarative model for your data, but then if you start to model your entire NoSQL database as though its a database of rows and tables (just using refs, for example), then you begin modeling it as though it's simply a SQL database (to a degree). Even MongoDB mentioned it (like you put in your question):
We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.
You mentioned:
This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.
I'm not sure what your collections look like exactly, but that definitely sounds like it could be a good use for $lookup.
Can I use them in real-time, operational querying
I would say, again, it depends on your use-case. You'll have to compare:
Desired semantics of your queries (declarative vs imperative)
Whether modeling your data as more relational (and thus using $lookup) in certain circumstances is worth the potential trade-off in computational time (that's assuming that querying across collections is even something to be concerned about, computationally speaking)
etc...
I'm sure in the coming months we'll see perf tests of the "left outer joins" and perhaps MongoDB will start writing some posts about when $lookup is an antipattern.
Hope this answer helps add to the discussion.
First of all MongoDB is a document-based database and will always be. So the $lookup aggregation pipeline stage new in version 3.2 didn't change MongoDB to relational database (RDBMS) as MongoDB's CTO mentioned:
We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.
The first limitation of $lookup as mentioned in the documentation is that it:
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.
Which means that you can't use it with a sharded collection.
Also the $lookup operator doesn't work directly with an array as mentioned in post therefore you will need a preliminary $unwind stage to denormalize the localField if it is an array.
Now you said:
This will solve the problem of having, when necessary, to update a huge number of documents.
This is a good idea if your data are updated often than they are read.
as mentioned in 6 Rules of Thumb for MongoDB Schema Design: Part 3 especially if you have a large hierarchical data sets.
Denormalizing one or more fields makes sense if those fields are read much more often than they are updated.
I believe that with careful schema design you probably will not need the $lookup operator.
The documentation on documents seems to favor the term "document", and also refers to "database records". Elsewhere, competent MongoDB developers have apparently interchangeably used "attributes" and "records".
What is the correct/official terminology to use in various instances? Is it documented somewhere on mongodb.org?
The confusion is merely because many MongoDB users are not just MongoDB users but also use 100 other techs including SQL.
I personally have mixed up my language as well, it's not uncommon however document and database records are the same thing and properties, attributes and columns are the same thing as well.
It should be noted that meteor calls them attributes because they are attributes within an object in JS (most likely that's why).
In MongoDB terminology document is a more generic term than a database record.
For example, the documentation page you linked says explicitly that database records, query selectors, update definitions, and index specifications are all documents.
I tried to find working examples of java/SpringData mongodb DBRefs but couldn't find any. I'm new to Mongodb and looking for ways to use SQL join-like functionality to aggregate/merge data from two mongo collections based on a common id.
Could someone point me in the right direction? Is application-level aggregating/merging is the only best solution with Mongo/Java/Spring combination?
There is a significant difference between DBRefs and Joins.
If you have two collections, that you are trying to join, then it might be worth looking at your data model. It could be the case, that you are using a relational modelling approach. This will not work with MongoDB.
It is usially better, to denormalize the dependent collection into the document of the master collection.
Then you do not need to join at all and make the most of the document model.
Before I dive really deep into MongoDB for days, I thought I'd ask a pretty basic question as to whether I should dive into it at all or not. I have basically no experience with nosql.
I did read a little about some of the benefits of document databases, and I think for this new application, they will be really great. It is always a hassle to do favourites, comments, etc. for many types of objects (lots of m-to-m relationships) and subclasses - it's kind of a pain to deal with.
I also have a structure that will be a pain to define in SQL because it's extremely nested and translates to a document a lot better than 15 different tables.
But I am confused about a few things.
Is it desirable to keep your database normalized still? I really don't want to be updating multiple records. Is that still how people approach the design of the database in MongoDB?
What happens when a user favourites a book and this selection is still stored in a user document, but then the book is deleted? How does the relationship get detached without foreign keys? Am I manually responsible for deleting all of the links myself?
What happens if a user favourited a book that no longer exists and I query it (some kind of join)? Do I have to do any fault-tolerance here?
MongoDB doesn't support server side foreign key relationships, normalization is also discouraged. You should embed your child object within parent objects if possible, this will increase performance and make foreign keys totally unnecessary. That said it is not always possible, so there is a special construct called DBRef which allows to reference objects in a different collection. This may be then not so speedy because DB has to make additional queries to read objects but allows for kind of foreign key reference.
Still you will have to handle your references manually. Only while looking up your DBRef you will see if it exists, the DB will not go through all the documents to look for the references and remove them if the target of the reference doesn't exist any more. But I think removing all the references after deleting the book would require a single query per collection, no more, so not that difficult really.
If your schema is more complex then probably you should choose a relational database and not nosql.
There is also a book about designing MongoDB databases: Document Design for MongoDB
UPDATE The book above is not available anymore, yet because of popularity of MongoDB there are quite a lot of others. I won't link them all, since such links are likely to change, a simple search on Amazon shows multiple pages so it shouldn't be a problem to find some.
See the MongoDB manual page for 'Manual references' and DBRefs for further specifics and examples
Above, #TomaaszStanczak states
MongoDB doesn't support server side foreign key relationships,
normalization is also discouraged. You should embed your child object
within parent objects if possible, this will increase performance and
make foreign keys totally unnecessary. That said it is not always
possible ...
Normalization is not discouraged by Mongo. To be clear, we are talking about two fundamentally different types of relationships two data entities can have. In one, one child entity is owned exclusively by a parent object. In this type of relationship the Mongo way is to embed.
In the other class of relationship two entities exist independently - have independent lifetimes and relationships. Mongo wishes that this type of relationship did not exist, and is frustratingly silent on precisely how to deal with it. Embedding is just not a solution. Normalization is not discouraged, or encouraged. Mongo just gives you two mechanisms to deal with it; Manual refs (analoguous to a key with the foreign key constraint binding two tables), and DBRef (a different, slightly more structured way of doing the same). In this use case SQL databases win.
The answers of both Tomasz and Francis contain good advice: that "normalization" is not discouraged by Mongo, but that you should first consider optimizing your database document design before creating "document references". DBRefs were mentioned by Tomasz, however as he alluded, are not a "magic bullet" and require additional processing to be useful.
What is now possible, as of MongoDB version 3.2, is to produce results equivalent to an SQL JOIN by using the $lookup aggregation pipeline stage operator. In this manner you can have a "normalized" document structure, but still be able to produce consolidated results. In order for this to work you need to create a unique key in the target collection that is hopefully both meaningful and unique. You can enforce uniqueness by creating a unique index on this field.
$lookup usage is pretty straightforward. Have a look at the documentation here: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#lookup-aggregation. Run the aggregate() method on the source collection (i.e. the "left" table). The from parameter is the target collection (i.e. the "right" table). The localField parameter would be the field in the source collection (i.e. the "foreign key"). The foreignField parameter would be the matching field in the target collection.
As far as orphaned documents, from your question I would presume you are thinking about a traditional RDBMS set of constraints, cascading deletes, etc. Again, as of MongoDB version 3.2, there is native support for document validation. Have a look at this StackOver article: How to apply constraints in MongoDB? Look at the second answer, from JohnnyHK
Packt Publishers have a bunch of good books on MongoDB. (Full Disclosure: I wrote a couple of them.)