In MongoDB an item within a collection is called a document. To put it simply, a document is to a collection what a record is to a table in a relational database. Now I've more than times already read the term sub-document.
What exactly is this?
Is it just a sub-object of the document? E.g., if I have the document:
{
foo: 'xyz',
bar: {
baz: 'bla'
}
}
Is bar then the sub-document of the outer document, or is there more to a sub-document? What are its characteristics?
I could not find an explanation of the term in MongoDB's documentation, but maybe it's in there and I have just not found it.
Can anybody explain this to me (or provide a hint where I can look it up)?
This is just different terminology. Yes, sub-objects are the same as sub-documents.
For example, when you're working with an object-document mapper (ODM), it might use terms: "document" and "embedded documents" (or "subdocuments"). Because you work with such library in some programming language (ruby, for example), one needs to differentiate mongodb documents from regular ruby objects. Otherwise conversations about the program would be ambiguous.
On the other hand, when you're directly querying the database from javascript shell, it's all just objects to you (which can contain other objects).
This is how I understand it.
Yes, bar is a sub document. In the mongodb docs, they call fields with the json type Object documents. If it's embedded in another document, like bar in your example, that is what they call a sub document.
Related
Is there a way in Meteor/MongoDB to do a find to get the collection an document's _id exists in?
What I am trying to accomplish is to create a generic Comments framework for my app, where comments can be applied to several different document types that are saved in multiple Mongo collections. For instance, comments can be applied to Pages as well as Comments. What I need to do is save the comment, then modify the parent document. I can pass in the _id of the parent, but without strong typing I can't figure out if this is a Page or a Comment (or any other "commentable" type I might come up with.
One solution, I think, would be to store the "parent"'s ID in the comment, but I wanted to try to save an array of comments in the parent instead.
I have a collection users in mongodb. I am using the node/mongodb API.
In my node.js if I go:
var users = db.collection("users");
and then
users.findOne({ ... })
Is this idiomatic and efficient?
I want to avoid loading all users into application memory.
In Mongo, collection.find() returns a cursor, which is an abstract representation of the results that match your query. It doesn't return your results over the wire until you iterate over the cursor by calling cursor.next(). Before iterating over it, you can, for instance, call cursor.limit(x) to limit the number of results that it is allowed to return.
collection.findOne() is effectively just a shortcut version of collection.find().limit(1).next(). So the cursor is never allowed to return more than the one result.
As already explained, the collection object itself is a facade allowing access to the collection, but which doesn't hold any actual documents in memory.
findOne() is very efficient, and it is indeed idiomatic, though IME it's more used in dev/debugging than in real application code. It's very useful in the CLI, but how often does a real application need to just grab any one document from a collection? The only case I can think of is when a given query can only ever return one document (i.e. an effective primary key).
Further reading:
Cursors,
Read Operations
Yes, that should only load one user into memory.
The collection object is exactly that, it doesn't return all users when you create a new collection object, only when you either use a findOne or you iterate a cursor from the return of a find.
I was reading the manual references part from the MongoDB Database References documentation, but I don't really understand the part of the "second query to resolve the referenced fields". Could you give me an example of this query, so i can get a better idea of what they are talking about.
"Manual references refers to the practice of including one document’s _id field in another document. The application can then issue a second query to resolve the referenced fields as needed."
The documentation is pretty clear in the manual section you are referring to which is the section on Database References. The most important part in comprehending this is contained in the opening statement on the page:
"MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases."
The further information covers the topic of how you might choose to deal with accessing data that you store in another collection.
There is the DBRef specification which without going into too much more detail, may be implemented in some drivers as a way that when these are found in your documents they will automatically retrieve (expand) the referenced document into the current document. This would be implemented "behind the scenes" with another query to that collection for the document of that _id.
In the case of Manual References this is basically saying that there is merely a field in your document that has as it's content the ObjectId from another document. This only differs from the DBRef as something that will never be processed by a base driver implementation is leaves how you handle any further retrieval of that other document soley up to you.
In the case of:
> db.collection.findOne()
{
_id: <ObjectId>,
name: "This",
something: "Else",
ref: <AnotherObjectId>
}
The ref field in the document is nothing more than a plain ObjectId and does nothing special. What this allows you to do is submit your own query to get the Object details this refers to:
> db.othercollection.findOne({ _id: <AnotherObjectId > })
{
_id: <ObjectId>
name: "That"
something: "I am a sub-document to This!"
}
Keep in mind that all of this processes on the client side via the driver API. None of this fetching other documents happens on the server in any case.
I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.
We have information we need on the client which is computed on a document. Like for example the number of entries in an array.
More practically we have a document Workshop which helds an array of participants (user's _id). Now we want the Workshop.numberOfParticipants().
There is no need to transmit the whole array to the client, so where to calculate this value? Is it possible to add this value to the document "Workshop" as a field like the other data?
I like to circumvent the generation of a Template.workshop.numberOfParticipants().
One option for the future is MongoDB's oddly-named aggregation framework. Queries written against the aggregate API can return documents with calculated fields.
Meteor core doesn't support aggregate queries yet, but it's on the wishlist.
You'll need to publish a set of documents called NumParticipants and then add an observer that updates a count property or something similar when documents are added (and similarly reduces that property when docs are removed).
An example of how to do this is described in the documentation for publish.