MongoDB - manual references example - mongodb

I was reading the manual references part from the MongoDB Database References documentation, but I don't really understand the part of the "second query to resolve the referenced fields". Could you give me an example of this query, so i can get a better idea of what they are talking about.
"Manual references refers to the practice of including one document’s _id field in another document. The application can then issue a second query to resolve the referenced fields as needed."

The documentation is pretty clear in the manual section you are referring to which is the section on Database References. The most important part in comprehending this is contained in the opening statement on the page:
"MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases."
The further information covers the topic of how you might choose to deal with accessing data that you store in another collection.
There is the DBRef specification which without going into too much more detail, may be implemented in some drivers as a way that when these are found in your documents they will automatically retrieve (expand) the referenced document into the current document. This would be implemented "behind the scenes" with another query to that collection for the document of that _id.
In the case of Manual References this is basically saying that there is merely a field in your document that has as it's content the ObjectId from another document. This only differs from the DBRef as something that will never be processed by a base driver implementation is leaves how you handle any further retrieval of that other document soley up to you.
In the case of:
> db.collection.findOne()
{
_id: <ObjectId>,
name: "This",
something: "Else",
ref: <AnotherObjectId>
}
The ref field in the document is nothing more than a plain ObjectId and does nothing special. What this allows you to do is submit your own query to get the Object details this refers to:
> db.othercollection.findOne({ _id: <AnotherObjectId > })
{
_id: <ObjectId>
name: "That"
something: "I am a sub-document to This!"
}
Keep in mind that all of this processes on the client side via the driver API. None of this fetching other documents happens on the server in any case.

Related

MongoDB Atlas Search index on normalized/indexed model

I'd like to use the fresh Atlas search index feature to perform search through my models.
It seems to me that the data model that I used can't be coupled with this mongo feature.
It seems to work really fine on embedded models, but for consistency reasons I can't nest objects, they are referenced by their id.
Example
Collection Product
{
name: "Foo product"
quantity: 3
tags: [
"id_123"
]
}
Collection Vendor
{
name: "Bar vendor"
address: ...
tags: [
"id_123"
]
}
Collection Tags
{
id: "id_123"
name: "food"
}
What I want
I want to type food in my search bar, and find the products associated to the tag food.
Detailed problematic
I have multiple business objects that are labelled by the same tag. I'd like to build a search index to search through my products, but I would want to $lookup before to denormalize my ids and to be able to find all the products that have the tag "food".
From the documentation, the $search operator must be the first operator of the aggregation pipeline, preventing me from lookup before searching. I had the idea to build a view first, to unpack the id with the correct tag to prepare the field. But impossible to build a search index on a view.
Is it completely impossible to make this work ? Do I need to give up on consistency on my tags by flattening and by embedding each of them directly in each model I need them to be able to use this feature ? That means if I want to update a tag, I need to find every business object that carry around the tag, and perform the update ?
I got in touch with the MongoDB support, and the Atlas Search proposed three ways to resolve this problem. I want to share the solutions with you if anybody steps on the same problem than I had to go through due to this model design.
Recommended: Transform the model in the Embedded way
The ideal MongoDB way of doing this would be to denormalize the model, and not using reference to various model. It has some drawbacks, like painful updates: each tags would be added directly in the model of Product, and Vendor, so there is no $lookup operations needed anymore. For my part, it is a no-go, the tags are planned to be updatable, and will be shared in almost every business objects I plan on making a model.
Collection Product
{
name: "Foo product"
quantity: 3
tags: [
"food"
]
}
Collection Vendor
{
name: "Bar vendor"
address: ...
tags: [
"food"
]
}
Not recommended but possible: Break the request in multiple parts
This would imply to keep the existing model, and to request the collections individually and resolving the sequential requests, application side.
We could put a Atlas Search index on Tags collection and use the research feature to find out the id of the tag we want. Then we could use this id to fetch directly in the Product/Vendor collection to find the product corresponding to the "food" tag. By tinkering the search application side, we could obtain satisfying results.
It is not the recommended way of doing it.
Theoretically my preferred way: Use the Materialized View feature
That is an intermediary solution, that will be the one I will try out. It is not perfect but for what I see, it tries to conciliated both of the capabilities of Referenced Model and Embedded model.
Atlas Search indexes are not usable on regular views. The workaround that can make this possible is Materialized view (which are more collection than view in definitive). This is made through the usage of the $merge operator which enables to save the results of ones aggregation pipeline in a collection. By re-running the pipeline, we can update the Materialized view. The trick is to make all required $lookup operations to denormalize the referenced model. Then use as final step the $merge operator to create the collection that supports the Atlas Search Index from scratch as any collection.
The only concern is the interval of update to choose for updating the Materialized view, that can be performance greedy. But on the paper, it is a really good solution for people like me that cannot (won't?) pay the price of painful updates strategy on Embedded models.

MongoDB: Looking for advice on designing schema for improving query efficiency

I am fairly new to MongoDB and I’m looking for advice on designing the schema before I commit to going down this route. I’m developing a collaborative documentation system, where the user creates a document and invites other users to collaborate, much like Google docs.
There are two collections. The first one stores documents and the second one stores lists of collaborators. When the user creates a new document, they assign a list of collaborators to this document. In the simplest form, the schema would look something like this
The Document schema contains some data but it also maintains a reference to a document in the Collaborators collection
Document model
{
....
collaborators: ObjectId; // e.g. 0x507f1f77bcf86cd799439011
}
Collaborators collection contains documents that contain an array of roles for the collaborators.
Collaborators model
{
_id: 0x507f1f77bcf86cd799439011; // refererenced by Document model
collaborators: [
{userId: 1, role: "editor"},
{userId: 2, role: "commenter}
]
}
I will have an API that fetches all those documents where the logged-in user’s userId is in the list of collaborators referenced by the document. Without much experience with writing efficient queries, I think a two-step lookup will work but it won’t be very efficient.
Step 1 → Find all the collaborators lists which contain userId, and obtain their _id field
Step 2 → Find all documents that have collaborators field containing one of the values found in Step 1
Is there a more efficient way to construct this query particularly if the users fetch this list frequently?
If I should redesign the schema in some way so that the lookup can be efficient, I’d like to know.
I'm using mongoose client if that's relevant.
I realized using MongoDB aggregation framework is what I needed. I was able to use $lookup and $match stage to achieve what I want. Still not sure how expensive this is given that $lookup will perform left join.
Here’s an example if anybody wants to look.
https://mongoplayground.net/p/RPheBZESC0H

why does MongoDB recommend two-way referencing? Isn't it just circular referencing?

Reference material:
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-2
db.person.findOne()
{
_id: ObjectID("AAF1"),
name: "Kate Monster",
tasks [ // array of references to Task documents
ObjectID("ADF9"),
ObjectID("AE02"),
ObjectID("AE73")
// etc
]
}
db.tasks.findOne()
{
_id: ObjectID("ADF9"),
description: "Write lesson plan",
due_date: ISODate("2014-04-01"),
owner: ObjectID("AAF1") // Reference to Person document
}
In a tutorial post from MongoDB, it specifically encourages two-way referencing. As you see, Person document references Tasks document and vice versa.
I thought it was circular referencing that should be avoided in most cases. The site didn't explain much why it's not a problem for MongoDB though. Could please someone help me understand why this is possible in MongoDB when it is such a big no no in SQL? I know it's more like a theoretical question, but I would like to implement this type of design in the database I'm working on if there is compelling reason to.
Its only a circular reference, if you make one out of it.
Meaning: Lets say you want to print your Mongo document to some JSON-String to print it in your browser. Instead of printing a bunch of ID's under the Tasks-Section you want to print the actual name. In this case you have to follow the ID and print the name.
However: if you now go into the object and resolve the IDs behind the Owner Object, you'll be printing your Person again. This could go on indefinetely, if you program it that way. If you don't its just a bunch of IDs either way.
EDIT: depending on your implementation, IDs are not resolved automatically and thus cause no headache.
One thing: depending on your data structure and performance considerations, its sometimes easier to put any object directly into your parent document. Referencing the ID on both sides only makes sense in many-to-many relationship.
HTH

Is a sub-document in MongoDB the same as a sub-object?

In MongoDB an item within a collection is called a document. To put it simply, a document is to a collection what a record is to a table in a relational database. Now I've more than times already read the term sub-document.
What exactly is this?
Is it just a sub-object of the document? E.g., if I have the document:
{
foo: 'xyz',
bar: {
baz: 'bla'
}
}
Is bar then the sub-document of the outer document, or is there more to a sub-document? What are its characteristics?
I could not find an explanation of the term in MongoDB's documentation, but maybe it's in there and I have just not found it.
Can anybody explain this to me (or provide a hint where I can look it up)?
This is just different terminology. Yes, sub-objects are the same as sub-documents.
For example, when you're working with an object-document mapper (ODM), it might use terms: "document" and "embedded documents" (or "subdocuments"). Because you work with such library in some programming language (ruby, for example), one needs to differentiate mongodb documents from regular ruby objects. Otherwise conversations about the program would be ambiguous.
On the other hand, when you're directly querying the database from javascript shell, it's all just objects to you (which can contain other objects).
This is how I understand it.
Yes, bar is a sub document. In the mongodb docs, they call fields with the json type Object documents. If it's embedded in another document, like bar in your example, that is what they call a sub document.

Links vs References in Document databases

I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.