Why does Mongoose have both schemas and models? - mongodb

The two types of objects seem to be so close to one another that having both feels redundant. What is the point of having both schemas and models?

EDIT: Although this has been useful for many people, as mentioned in the comments it answers the "how" rather than the why. Thankfully, the why of the question has been answered elsewhere also, with this answer to another question. This has been linked in the comments for some time but I realise that many may not get that far when reading.
Often the easiest way to answer this type of question is with an example. In this case, someone has already done it for me :)
Take a look here:
http://rawberg.com/blog/nodejs/mongoose-orm-nested-models/
EDIT: The original post (as mentioned in the comments) seems to no longer exist, so I am reproducing it below. Should it ever return, or if it has just moved, please let me know.
It gives a decent description of using schemas within models in mongoose and why you would want to do it, and also shows you how to push tasks via the model while the schema is all about the structure etc.
Original Post:
Let’s start with a simple example of embedding a schema inside a model.
var TaskSchema = new Schema({
name: String,
priority: Number
});
TaskSchema.virtual('nameandpriority')
.get( function () {
return this.name + '(' + this.priority + ')';
});
TaskSchema.method('isHighPriority', function() {
if(this.priority === 1) {
return true;
} else {
return false;
}
});
var ListSchema = new Schema({
name: String,
tasks: [TaskSchema]
});
mongoose.model('List', ListSchema);
var List = mongoose.model('List');
var sampleList = new List({name:'Sample List'});
I created a new TaskSchema object with basic info a task might have. A Mongoose virtual attribute is setup to conveniently combine the name and priority of the Task. I only specified a getter here but virtual setters are supported as well.
I also defined a simple task method called isHighPriority to demonstrate how methods work with this setup.
In the ListSchema definition you’ll notice how the tasks key is configured to hold an array of TaskSchema objects. The task key will become an instance of DocumentArray which provides special methods for dealing with embedded Mongo documents.
For now I only passed the ListSchema object into mongoose.model and left the TaskSchema out. Technically it's not necessary to turn the TaskSchema into a formal model since we won’t be saving it in it’s own collection. Later on I’ll show you how it doesn’t harm anything if you do and it can help to organize all your models in the same way especially when they start spanning multiple files.
With the List model setup let’s add a couple tasks to it and save them to Mongo.
var List = mongoose.model('List');
var sampleList = new List({name:'Sample List'});
sampleList.tasks.push(
{name:'task one', priority:1},
{name:'task two', priority:5}
);
sampleList.save(function(err) {
if (err) {
console.log('error adding new list');
console.log(err);
} else {
console.log('new list successfully saved');
}
});
The tasks attribute on the instance of our List model (sampleList) works like a regular JavaScript array and we can add new tasks to it using push. The important thing to notice is the tasks are added as regular JavaScript objects. It’s a subtle distinction that may not be immediately intuitive.
You can verify from the Mongo shell that the new list and tasks were saved to mongo.
db.lists.find()
{ "tasks" : [
{
"_id" : ObjectId("4dd1cbeed77909f507000002"),
"priority" : 1,
"name" : "task one"
},
{
"_id" : ObjectId("4dd1cbeed77909f507000003"),
"priority" : 5,
"name" : "task two"
}
], "_id" : ObjectId("4dd1cbeed77909f507000001"), "name" : "Sample List" }
Now we can use the ObjectId to pull up the Sample List and iterate through its tasks.
List.findById('4dd1cbeed77909f507000001', function(err, list) {
console.log(list.name + ' retrieved');
list.tasks.forEach(function(task, index, array) {
console.log(task.name);
console.log(task.nameandpriority);
console.log(task.isHighPriority());
});
});
If you run that last bit of code you’ll get an error saying the embedded document doesn’t have a method isHighPriority. In the current version of Mongoose you can’t access methods on embedded schemas directly. There’s an open ticket to fix it and after posing the question to the Mongoose Google Group, manimal45 posted a helpful work-around to use for now.
List.findById('4dd1cbeed77909f507000001', function(err, list) {
console.log(list.name + ' retrieved');
list.tasks.forEach(function(task, index, array) {
console.log(task.name);
console.log(task.nameandpriority);
console.log(task._schema.methods.isHighPriority.apply(task));
});
});
If you run that code you should see the following output on the command line.
Sample List retrieved
task one
task one (1)
true
task two
task two (5)
false
With that work-around in mind let’s turn the TaskSchema into a Mongoose model.
mongoose.model('Task', TaskSchema);
var Task = mongoose.model('Task');
var ListSchema = new Schema({
name: String,
tasks: [Task.schema]
});
mongoose.model('List', ListSchema);
var List = mongoose.model('List');
The TaskSchema definition is the same as before so I left it out. Once its turned into a model we can still access it’s underlying Schema object using dot notation.
Let’s create a new list and embed two Task model instances within it.
var demoList = new List({name:'Demo List'});
var taskThree = new Task({name:'task three', priority:10});
var taskFour = new Task({name:'task four', priority:11});
demoList.tasks.push(taskThree.toObject(), taskFour.toObject());
demoList.save(function(err) {
if (err) {
console.log('error adding new list');
console.log(err);
} else {
console.log('new list successfully saved');
}
});
As we’re embedding the Task model instances into the List we’re calling toObject on them to convert their data into plain JavaScript objects that the List.tasks DocumentArray is expecting. When you save model instances this way your embedded documents will contain ObjectIds.
The complete code example is available as a gist. Hopefully these work-arounds help smooth things over as Mongoose continues to develop. I’m still pretty new to Mongoose and MongoDB so please feel free to share better solutions and tips in the comments. Happy data modeling!

Schema is an object that defines the structure of any documents that will be stored in your MongoDB collection; it enables you to define types and validators for all of your data items.
Model is an object that gives you easy access to a named collection, allowing you to query the collection and use the Schema to validate any documents you save to that collection. It is created by combining a Schema, a Connection, and a collection name.
Originally phrased by Valeri Karpov, MongoDB Blog

I don't think the accepted answer actually answers the question that was posed. The answer doesn't explain why Mongoose has decided to require a developer to provide both a Schema and a Model variable. An example of a framework where they have eliminated the need for the developer to define the data schema is django--a developer writes up their models in the models.py file, and leaves it to the framework to manage the schema. The first reason that comes to mind for why they do this, given my experience with django, is ease-of-use. Perhaps more importantly is the DRY (don't repeat yourself) principle--you don't have to remember to update the schema when you change the model--django will do it for you! Rails also manages the schema of the data for you--a developer doesn't edit the schema directly, but changes it by defining migrations that manipulate the schema.
One reason I could understand that Mongoose would separate the schema and the model is instances where you would want to build a model from two schemas. Such a scenario might introduce more complexity than is worth managing--if you have two schemas that are managed by one model, why aren't they one schema?
Perhaps the original question is more a relic of the traditional relational database system. In world NoSQL/Mongo world, perhaps the schema is a little more flexible than MySQL/PostgreSQL, and thus changing the schema is more common practice.

To understand why? you have to understand what actually is Mongoose?
Well, the mongoose is an object data modeling library for MongoDB and Node JS, providing a higher level of abstraction. So it's a bit like the relationship between Express and Node, so Express is a layer of abstraction over regular Node, while Mongoose is a layer of abstraction over the regular MongoDB driver.
An object data modeling library is just a way for us to write Javascript code that will then interact with a database. So we could just use a regular MongoDB driver to access our database, it would work just fine.
But instead we use Mongoose because it gives us a lot more functionality out of the box, allowing for faster and simpler development of our applications.
So, some of the features Mongoose gives us schemas to model our data and relationship, easy data validation, a simple query API, middleware, and much more.
In Mongoose, a schema is where we model our data, where we describe the structure of the data, default values, and validation, then we take that schema and create a model out of it, a model is basically a wrapper around the schema, which allows us to actually interface with the database in order to create, delete, update, and read documents.
Let's create a model from a schema.
const tourSchema = new mongoose.Schema({
name: {
type: String,
required: [true, 'A tour must have a name'],
unique: true,
},
rating: {
type: Number,
default: 4.5,
},
price: {
type: Number,
required: [true, 'A tour must have a price'],
},
});
//tour model
const Tour = mongoose.model('Tour', tourSchema);
According to convetion first letter of a model name must be capitalized.
Let's create instance of our model that we created using mongoose and schema. also, interact with our database.
const testTour = new Tour({ // instance of our model
name: 'The Forest Hiker',
rating: 4.7,
price: 497,
});
// saving testTour document into database
testTour
.save()
.then((doc) => {
console.log(doc);
})
.catch((err) => {
console.log(err);
});
So having both schama and modle mongoose makes our life easier.

Think of Model as a wrapper to schemas. Schemas define the structure of your document , what kind of properties can you expect and what will be their data type (String,Number etc.). Models provide a kind of interface to perform CRUD on schema. See this post on FCC.

Schema basically models your data (where you provide datatypes for your fields) and can do some validations on your data. It mainly deals with the structure of your collection.
Whereas the model is a wrapper around your schema to provide you with CRUD methods on collections. It mainly deals with adding/querying the database.
Having both schema and model could appear redundant when compared to other frameworks like Django (which provides only a Model) or SQL (where we create only Schemas and write SQL queries and there is no concept of model). But, this is just the way Mongoose implements it.

Related

Mongoose: Populate on existing DBRef

I am migrating a Spring project to Nextjs&co for personal enrichment.
I have an existing mongodb database with school related collections such as:
// school (as json)
"_id" : ObjectId("5f457f041291df2910dea1ed"),
"name" : "San Lucas Primary School",
...
"campus" : DBRef("campus", ObjectId("5f457dd9126d210893e14e11"))
I've loaded up mongoose, and have tried to wrangle it for the last few days to get it to populate campus.
If I define the schema like so:
import mongoose from 'mongoose'
const SchoolSchema = new mongoose.Schema({
name: String,
campus: {type: mongoose.Schema.Types.ObjectId, ref: 'campus'},
});
module.exports = mongoose.model("School", SchoolSchema, 'school') // i define the existing collection name 'school' to avoid the built in pluralization
When I do school.find() in debugger, I get the mongoose model object. The campus field is missing, and there is an error: ValidatorError: Cannot read properties of undefined (reading 'options')\n at _init
When I alter the Schema to not include campus:
import mongoose from 'mongoose'
const SchoolSchema = new mongoose.Schema({
name: String,
// campus: {type: mongoose.Schema.Types.ObjectId, ref: 'campus'},
});
module.exports = mongoose.model("School", SchoolSchema, 'school')
The debugger now spits out the whole object, including campus but it looks like this:
campus = DBRef {collection: "campus", oid: ObjectId, db: undefined, fields: Object}
There was another configuration where it was spitting it out as if it were creating the object at runtime, new DBref("campus", new ObjectId("...")) or something like that.
When I json it out, it always ends up {$ref: 'campus', $id: ...}. But if I do not include it in the schema, I can't do all that handy populate and things.
I'm this far from extracting the id as a string and doing findById().
Folks, I am STUMPED.
DBRef is a rather controversial convention of data format coming from early versions (doc for v2.2) long before $lookup was added in v3.2. The main reason was to allow cross-database references between documents, yet even then it was not recommended except very niche usecases:
In most cases you should use the manual reference method for connecting two or more related documents. However, if you need to reference documents from multiple collections, consider using DBRefs.
It is not supported by all drivers, caused many problems with export/import because regular field names could not start with $ as recently as in v4.4 https://www.mongodb.com/docs/v4.4/reference/limits/#mongodb-limit-Restrictions-on-Field-Names
Mongoose on the other hand, is quite opinionated ODM which comes with it's own conventions. Automatic opted out from DBRefs in favour of their own Population logic, which is so much incompatible with DBRefs, that they even stopped calling it "DBRef-like" starting from v3.0.
There was an attempt to add support of native DBRefs to mongoose, but the project looks abandoned. You may find it useful to read this explaination: DbRef with Mongoose - mongoose-dbref or populate?
Anyway, apart from DBRefs you will likely face other issues related to mongoose vs spring conventions of document structure. Off the top of my head it's mongoose optimistic locking, which relies on the value of _v field, otherwise not exposed on the application level, etc.
If you intend to use the same database in a heterogeneous setup you can't really use anything but native drivers, as all ODMs come with own conventions and it is very likely they won't match.

Why store all reference ids?

lets say I have one to many relation with Post and Comment models.
When embedding is not used he suggested way to handle data relation is:
const PostSchema = new Schema({
_id: Types.ObjectId,
comments: [{type: Types.ObjectId, ref:'comment'}]
})
const CommentSchema = new Schema({
_id:Types.ObjectId
})
Wouldnt the schema design below be more appropriate?(storing postId as foreign key in Comment). And also is there a name for such relation method?
const PostSchema = new Schema({
_id: Types.ObjectId
})
const CommentSchema = new Schema({
_id:Types.ObjectId,
postId: Types.ObjectId
})
To get a post and all its comments I would simply do:
let post = await Post.find({ _id: postId });
post.comments = await Comment.find({ postId });
With the first method, every time a new comment is created, the post document must be updated as well. And if we want to guarantee that commentId is added to comments array, we should use transaction, which makes this update even more costly.
However with the second approach, we only have to create comment.
Now the downside is that I will not be able to use $lookup, but performance wise, would it make much of a difference? $lookup is not like JOIN in relational databases, where JOINs are done in a single database operation. So using $lookup or simply querying by comment's postId(indexed) to "JOIN" the data would not make much of a difference in terms of performance.
your example can be found in the documentation applied to a different domain model (comments are books and posts are publishers).
When using references, the growth of the relationships determine where to store the reference. If the number of comments per post is small with limited growth, storing the comment reference inside the post document may sometimes be useful. Otherwise, if the number of comments per post is unbounded, this data model would lead to mutable, growing arrays...
To avoid mutable, growing arrays, store the post reference inside the comment document
The document schema should be oriented on your use case requirements, which define your queries. As you identified, it makes sense to store the post reference on the comment, so you don't have to update the post every time you create a new comment. So without more information about growth rates, access frequency and other loading scenarios it sounds like option 2 makes more sense in this context.
The first approach is unusual but it can work in some situations depending on query needs.

Moving from relational db to mongodb

I have a question on best practises or ideal way how I should store the data in the database. As an example I have a Site that has a Country assigned.
Table Countries: id|name|alpha2
Table Sites: id|countryId|name
Each Site has a reference to the country ID.
I would like to create a new website using Meteor and its mongodb and was wondering how I should store the objects. Do I create a colleciton "countries" and "sites" and use the country _id to as a reference? Then resolve the references using transform?
Looking at SimpleSchema I came up with the following:
Schemas.Country = new SimpleSchema ({
name: {
type: String
},
alpha2: {
type: String,
max: 2
}
});
Schemas.Site = new SimpleSchema({
name: {
type: String,
label: "Site Name"
},
country: {
type: Schemas.Country
}
});
Countries = new Meteor.Collection("countries");
Countries.attachSchema(Schemas.Country);
Sites = new Meteor.Collection("sites");
Sites.attachSchema(Schemas.Site);
I was just wondering how this is then stored in the db. As I have 2 collections but inside the sites collection I do have defined country objects as well. What if a country changes its alpha2 code (very unlikely)?
Also this would continue where I have a collection called "conditions". Each condition will have a Site defined. I could now define the whole Site object into the condition object. What if the Sitename changes? Would I need to manually change it in all condition objects?
This confuses me a bit. I am very thankful for all your thoughts.
The challenge with Meteor is that its tightly bound to Mongo, which is not good to built OLTP app that require normalized DB design. Mongo is good for OLAP kind of apps which fall in WORM (Write Once Read Many) category. I would like to see Meteor supporting OrientDB as they do Mongo.
There can be two approaches:
Normalize the DB as we do in RDBMS and then retrieve data by hitting
data multiple times. Here is a good article explaining this approach - reactive joins in meteor.
Joins in
Meteor
are suggested in future. You can also try Meteor packages - publish
composite or
publish with
relations
Keep data de-normalized at least partially (for 1-N relation you can
embed things in document, for N-N relation you may having separate
collection). For instance, 'Student' can be embedded in 'Class' as
student will never be in more than 1 class, but to relate 'Student'
and 'Subject', they can be in different collections (N-N relation -
student will have more than one subject and each subject will be
taken by more than one student). For fetching N-N relation again you
can use the same approach that is mentioned point above.
I am not able to give you exact code example, but I hope it helps.

Traverse the mongo DB scheme in Mongoose JS

I'm trying to find out if Mongoose.JS exposes subDocuments with in the .modelSchema. The basic idea is that I want to generate a tree view of my database model.
For Exampe I a status schema that allow each status to have an array of questions that are made from a Question Schema. My Mongoose Schema looks like this:
var StatusScheme = new Schema ({
StatusName: {type: String },
isClosed: {type:Boolean},
Questions:[QuestionSchema]
});
var QuestionSchema = new Schema ({
QuestionName: {type: String },
isRequired: {type:Boolean},
QuestionType:{type: String }
});
Now in my node.js app I want to iterate the schema to generate a tree of field names:
+StatusName
+isClosed
+Questions
+QuestionName
+isRequired
+QuestionType
I was exploring in the .modelSchemas[schema].tree object and I can get all of my field names the problem is I can't detect if the Questions array is really a different schema. Does anyone have any insight into the object that may tell me this? Once I know that a field is really a subdocument I can recursivly iterate the entire schema to build my tree.
I think I may have found the link. I can take a look modelSchemas object and dig into each path looking to see if the path has a caster object. If it does I can then fill it with the sub document data.
isClosed is not a subDocument and Questions is a subdocument. It looks like Mongoose then includes the constructor for in in the modelschema. Any thoughts on a better way to find the "tree" view or sub document relation within Mongoose.
Details can be found # https://groups.google.com/forum/#!topic/mongoose-orm/4sBbi388msI
A Child schema must be defined before it is embedded as a sub document.
To find the sub document schema traverse to "CaseSchema.paths["MYRecipients"].options.type[0]"
The tree property also contains the nested relationship between schemas.

Mongoose joining data

If I have an object in my MongoDB that will need to be used EVERYWHERE in my system, so it is in its own collection. However, I con't quite figure out how to get the data to show up automatically on the other objects it is joined to.
Here is an example:
Schema1 = { name: String }
Schema2 = { something: String, other_thing: [{schema1_id: String}] }
Now what I want is to be able to say var name = mySchema2.name; and get the name of the linked Schema1 object.
I am using Mongoose, Express and Node.js and I have tried using a Mongoose 'virtual' for this, but when I say res.send(myobject); I don't see the virtual property anywhere on the object.
What is the best way to do this?
I know it is far after you post the question but it might help others.
If you use this reference all over you may want to consider using embedded document. The benefits of embedded document is that you get them when you query the parent document thus it save you additional query and the drawbacks is that the parent document may become large (or even very large) thus you should use them but use them carefully.
Here is an example of simple embedded document. Instead of referencing 'comments' in the post document, which require additional query, we will embed it (code is a bit pseudo):
var postSchema = new Schema({
author : {type : String},
title : {type : String, required : true},
content : {type : String, required : true},
comment : {
owner : {type : String},
subject : {type: String, required : true},
content : {type String, required : true}
}
});
MongoDB allows you a simple and convenience way to query comments' fields by the dot character. For example if we like to query only comments which their subject starts with 'car' we do as follow:
myPostModel.find({ 'comment.subject' : /car*/ }).exec(function(err, result){
Do some stuff with the result...
});
Note that for simplicity of the example the comment field in the post is not an array (one comment per post is allowed in this example). However even if it will be an array, mongo refer to array's elements very elegantly in the same way.
There are a couple of plugins to help with DBRefs in Mongoose.
mongoose-dbref uses the DBRef standards and is probably a good place to start.
mongoose-plugins is one I wrote a while ago but it works in a slightly different way.
In mongodb no such word as JOIN, because of joins killing scalability. But most drivers support DBRefs, they are just making additional request to load referenced data.
So you can just make additional request yourself to load object that you using everywhere.
If you using some object everywhere in your app it sounds like object that need to be in cache. But mongodb work as some kind o cache if enough memory to load object into memory. So, to keep it simple just make additional request to load object.