lets say I have one to many relation with Post and Comment models.
When embedding is not used he suggested way to handle data relation is:
const PostSchema = new Schema({
_id: Types.ObjectId,
comments: [{type: Types.ObjectId, ref:'comment'}]
})
const CommentSchema = new Schema({
_id:Types.ObjectId
})
Wouldnt the schema design below be more appropriate?(storing postId as foreign key in Comment). And also is there a name for such relation method?
const PostSchema = new Schema({
_id: Types.ObjectId
})
const CommentSchema = new Schema({
_id:Types.ObjectId,
postId: Types.ObjectId
})
To get a post and all its comments I would simply do:
let post = await Post.find({ _id: postId });
post.comments = await Comment.find({ postId });
With the first method, every time a new comment is created, the post document must be updated as well. And if we want to guarantee that commentId is added to comments array, we should use transaction, which makes this update even more costly.
However with the second approach, we only have to create comment.
Now the downside is that I will not be able to use $lookup, but performance wise, would it make much of a difference? $lookup is not like JOIN in relational databases, where JOINs are done in a single database operation. So using $lookup or simply querying by comment's postId(indexed) to "JOIN" the data would not make much of a difference in terms of performance.
your example can be found in the documentation applied to a different domain model (comments are books and posts are publishers).
When using references, the growth of the relationships determine where to store the reference. If the number of comments per post is small with limited growth, storing the comment reference inside the post document may sometimes be useful. Otherwise, if the number of comments per post is unbounded, this data model would lead to mutable, growing arrays...
To avoid mutable, growing arrays, store the post reference inside the comment document
The document schema should be oriented on your use case requirements, which define your queries. As you identified, it makes sense to store the post reference on the comment, so you don't have to update the post every time you create a new comment. So without more information about growth rates, access frequency and other loading scenarios it sounds like option 2 makes more sense in this context.
The first approach is unusual but it can work in some situations depending on query needs.
Related
I am thinking of implementing a followers, following relationship using Mongoose.
On a very simple way, I believe the easiest way to make a user follow another user is by using an array and adding that user_id to the array as below
const mongoose = require("mongoose");
const UserSchema = new mongoose.Schema(
{
_id: mongoose.Schema.Types.ObjectId,
api_key: String,
followers:Array
}
To add a user to the user followers list I did
....const user = await User.findOne(...);
user.followers.push(user_id_of_follower);
The above works but I'm a little bit skeptical. What if a user has 1 billion followers, is the above array pattern the best way to model a follower/following relationship in mongodb?
Thank you.
I think for heavy data and big data its better to create another collection with
this type
{
user_id:"123",
follower_id : "222"
}
when a user follows a new user add new records, and you could add a new field in the user for a count of followers and following
when you want to get a list of followers of the specific user in my solution you could be using paging, skip
I have a DB with a user collection, a book collection, and a comment collection.
I would like to perform a site-wide search capability and for that reason I would like to create a new collection of indexed content for search entries.
The new collection I imagine will look like this:
var indexContent= mongoose.Schema({
keyword:String //The keyword search
rank: Number //a ranking I give the matched document
refType: String //either "User", "Series", or "Episode"
ref: mongoose.Schema.Types.ObjectId //The id of the matched document,
date: Date // date when this entry was created
})
I feel like this is ideal because, whenever a keyword exists in the collection I can quickly retrieve the results, and if the keyword was never searched before, I generate the results for the first time, then save it to serve to other people.
The problem is, considering the site has a very high rate of updates, how often should I dump my cached results?
Schema:
articles: [
{
_id: uid,
owner: userId,
title: string,
text: text,
}
],
comments_1: [
{
// single comment
articleId: uid,
text: text,
user: {
name: string,
id: uid
}
}
],
comments_2: [
{
// all comments at once
articleId: uid,
comments: [
{
_id: commentId,
text: text,
user: {
name: string,
id: uid
}
}
],
}
],
I'm a bit confused with mongodb recommendations:
Say, i need to retrieve information for an article page. I'll need to do 2 requests, first to find article by id, and the second to find comments. If i'd include comments (comments_2) as property into each article, i'd need to perform only one query to get all the data i need, and if i'd need to list say, titles of 20 articles, i'd perform a query with specified properties to be retrieved, right?
Should i store comments and articles in different collections?
If comments will be in different store, should i store comments the comments_1 way or comments_2 way?
I'll avoid deep explanations, because the schema explains my point clearly, i guess. Briefly, i don't get if it's better to store everything in one place and then specify properties i want to retrieve while querying, or abstract pieces of data to different collections?
In a relational database, this would be achieved by JOIN. Apparently, there is a NoSQL equivalent in MongoDB, starting from version 3.2 called $lookup
This allows you to keep comments and articles in separate schemas, but still retrieve list of comments for an article with a single query.
Stack Overflow Source
It's a typical trade-off you have to make. Both approaches have their own pros and cons and you have to choose what fits best for your use case. Couple of inputs:
Single table:
fast load single article, since you load all data in one query
no issues with loading titles of 20 articles (you can query only subset of fields using projection
Multiple table:
much easier to do perpendicular queries (e.g comments made by specific user, etc)
I would go with version 1, since it's simpler and version 2 won't give you any advantage
Well, MongoDB models are usually meant to hold data and relationship together since it doesn't provides JOINS ($lookup is the nearest to join and costly, best to avoid).
That's why in DB modeling there is huge emphasis on denormalization, since there are two benefits of storing together
You wouldn't have to join the collections and you can get the data in a single query.
Since mongo provides atomic update, you can update comments and article in one go, not worrying about transaction and rollback.
So almost certainly you would like to put comments inside article collection. So it would be something like
articles: [
{
_id: uid,
owner: userId,
title: string,
text: text,
comments: [
{
_id: commentId,
text: text,
user: {
name: string,
id: uid
}
}
]
}
]
Before we agree to it, let us see the drawback of above approach.
There is a limit of 16MB per document which is huge, but think if the text of your article is large and the comments on that article is also in large number, maybe it can cross 16 MB.
All the places where you get article for other purposes you might have to exclude the comments field, otherwise it would be heavy and slow.
If you have to do aggregation again we might get into memory limit issue if we need to aggregate based on comments also one way or other.
These are serious problem, and we cannot ignore that, now we might want to keep it in different collection and see what we are losing.
First of all comment and articles though linked but are different entity, so you might never need to update them together for any field.
Secondly, you would have to load comments separately, which makes sense in normal use-case, in most application that's how we proceed, so that too is not an issue.
So in my opinion clear winner is having two separate collection
articles: [
{
_id: uid,
owner: userId,
title: string,
text: text,
}
],
comments: [
{
// single comment
articleId: uid,
text: text,
user: {
name: string,
id: uid
}
}
]
You wouldn't want to go comment_2 way if you are choosing for two collection approach, again for same reason as what if there are huge comments for a single article.
The two types of objects seem to be so close to one another that having both feels redundant. What is the point of having both schemas and models?
EDIT: Although this has been useful for many people, as mentioned in the comments it answers the "how" rather than the why. Thankfully, the why of the question has been answered elsewhere also, with this answer to another question. This has been linked in the comments for some time but I realise that many may not get that far when reading.
Often the easiest way to answer this type of question is with an example. In this case, someone has already done it for me :)
Take a look here:
http://rawberg.com/blog/nodejs/mongoose-orm-nested-models/
EDIT: The original post (as mentioned in the comments) seems to no longer exist, so I am reproducing it below. Should it ever return, or if it has just moved, please let me know.
It gives a decent description of using schemas within models in mongoose and why you would want to do it, and also shows you how to push tasks via the model while the schema is all about the structure etc.
Original Post:
Let’s start with a simple example of embedding a schema inside a model.
var TaskSchema = new Schema({
name: String,
priority: Number
});
TaskSchema.virtual('nameandpriority')
.get( function () {
return this.name + '(' + this.priority + ')';
});
TaskSchema.method('isHighPriority', function() {
if(this.priority === 1) {
return true;
} else {
return false;
}
});
var ListSchema = new Schema({
name: String,
tasks: [TaskSchema]
});
mongoose.model('List', ListSchema);
var List = mongoose.model('List');
var sampleList = new List({name:'Sample List'});
I created a new TaskSchema object with basic info a task might have. A Mongoose virtual attribute is setup to conveniently combine the name and priority of the Task. I only specified a getter here but virtual setters are supported as well.
I also defined a simple task method called isHighPriority to demonstrate how methods work with this setup.
In the ListSchema definition you’ll notice how the tasks key is configured to hold an array of TaskSchema objects. The task key will become an instance of DocumentArray which provides special methods for dealing with embedded Mongo documents.
For now I only passed the ListSchema object into mongoose.model and left the TaskSchema out. Technically it's not necessary to turn the TaskSchema into a formal model since we won’t be saving it in it’s own collection. Later on I’ll show you how it doesn’t harm anything if you do and it can help to organize all your models in the same way especially when they start spanning multiple files.
With the List model setup let’s add a couple tasks to it and save them to Mongo.
var List = mongoose.model('List');
var sampleList = new List({name:'Sample List'});
sampleList.tasks.push(
{name:'task one', priority:1},
{name:'task two', priority:5}
);
sampleList.save(function(err) {
if (err) {
console.log('error adding new list');
console.log(err);
} else {
console.log('new list successfully saved');
}
});
The tasks attribute on the instance of our List model (sampleList) works like a regular JavaScript array and we can add new tasks to it using push. The important thing to notice is the tasks are added as regular JavaScript objects. It’s a subtle distinction that may not be immediately intuitive.
You can verify from the Mongo shell that the new list and tasks were saved to mongo.
db.lists.find()
{ "tasks" : [
{
"_id" : ObjectId("4dd1cbeed77909f507000002"),
"priority" : 1,
"name" : "task one"
},
{
"_id" : ObjectId("4dd1cbeed77909f507000003"),
"priority" : 5,
"name" : "task two"
}
], "_id" : ObjectId("4dd1cbeed77909f507000001"), "name" : "Sample List" }
Now we can use the ObjectId to pull up the Sample List and iterate through its tasks.
List.findById('4dd1cbeed77909f507000001', function(err, list) {
console.log(list.name + ' retrieved');
list.tasks.forEach(function(task, index, array) {
console.log(task.name);
console.log(task.nameandpriority);
console.log(task.isHighPriority());
});
});
If you run that last bit of code you’ll get an error saying the embedded document doesn’t have a method isHighPriority. In the current version of Mongoose you can’t access methods on embedded schemas directly. There’s an open ticket to fix it and after posing the question to the Mongoose Google Group, manimal45 posted a helpful work-around to use for now.
List.findById('4dd1cbeed77909f507000001', function(err, list) {
console.log(list.name + ' retrieved');
list.tasks.forEach(function(task, index, array) {
console.log(task.name);
console.log(task.nameandpriority);
console.log(task._schema.methods.isHighPriority.apply(task));
});
});
If you run that code you should see the following output on the command line.
Sample List retrieved
task one
task one (1)
true
task two
task two (5)
false
With that work-around in mind let’s turn the TaskSchema into a Mongoose model.
mongoose.model('Task', TaskSchema);
var Task = mongoose.model('Task');
var ListSchema = new Schema({
name: String,
tasks: [Task.schema]
});
mongoose.model('List', ListSchema);
var List = mongoose.model('List');
The TaskSchema definition is the same as before so I left it out. Once its turned into a model we can still access it’s underlying Schema object using dot notation.
Let’s create a new list and embed two Task model instances within it.
var demoList = new List({name:'Demo List'});
var taskThree = new Task({name:'task three', priority:10});
var taskFour = new Task({name:'task four', priority:11});
demoList.tasks.push(taskThree.toObject(), taskFour.toObject());
demoList.save(function(err) {
if (err) {
console.log('error adding new list');
console.log(err);
} else {
console.log('new list successfully saved');
}
});
As we’re embedding the Task model instances into the List we’re calling toObject on them to convert their data into plain JavaScript objects that the List.tasks DocumentArray is expecting. When you save model instances this way your embedded documents will contain ObjectIds.
The complete code example is available as a gist. Hopefully these work-arounds help smooth things over as Mongoose continues to develop. I’m still pretty new to Mongoose and MongoDB so please feel free to share better solutions and tips in the comments. Happy data modeling!
Schema is an object that defines the structure of any documents that will be stored in your MongoDB collection; it enables you to define types and validators for all of your data items.
Model is an object that gives you easy access to a named collection, allowing you to query the collection and use the Schema to validate any documents you save to that collection. It is created by combining a Schema, a Connection, and a collection name.
Originally phrased by Valeri Karpov, MongoDB Blog
I don't think the accepted answer actually answers the question that was posed. The answer doesn't explain why Mongoose has decided to require a developer to provide both a Schema and a Model variable. An example of a framework where they have eliminated the need for the developer to define the data schema is django--a developer writes up their models in the models.py file, and leaves it to the framework to manage the schema. The first reason that comes to mind for why they do this, given my experience with django, is ease-of-use. Perhaps more importantly is the DRY (don't repeat yourself) principle--you don't have to remember to update the schema when you change the model--django will do it for you! Rails also manages the schema of the data for you--a developer doesn't edit the schema directly, but changes it by defining migrations that manipulate the schema.
One reason I could understand that Mongoose would separate the schema and the model is instances where you would want to build a model from two schemas. Such a scenario might introduce more complexity than is worth managing--if you have two schemas that are managed by one model, why aren't they one schema?
Perhaps the original question is more a relic of the traditional relational database system. In world NoSQL/Mongo world, perhaps the schema is a little more flexible than MySQL/PostgreSQL, and thus changing the schema is more common practice.
To understand why? you have to understand what actually is Mongoose?
Well, the mongoose is an object data modeling library for MongoDB and Node JS, providing a higher level of abstraction. So it's a bit like the relationship between Express and Node, so Express is a layer of abstraction over regular Node, while Mongoose is a layer of abstraction over the regular MongoDB driver.
An object data modeling library is just a way for us to write Javascript code that will then interact with a database. So we could just use a regular MongoDB driver to access our database, it would work just fine.
But instead we use Mongoose because it gives us a lot more functionality out of the box, allowing for faster and simpler development of our applications.
So, some of the features Mongoose gives us schemas to model our data and relationship, easy data validation, a simple query API, middleware, and much more.
In Mongoose, a schema is where we model our data, where we describe the structure of the data, default values, and validation, then we take that schema and create a model out of it, a model is basically a wrapper around the schema, which allows us to actually interface with the database in order to create, delete, update, and read documents.
Let's create a model from a schema.
const tourSchema = new mongoose.Schema({
name: {
type: String,
required: [true, 'A tour must have a name'],
unique: true,
},
rating: {
type: Number,
default: 4.5,
},
price: {
type: Number,
required: [true, 'A tour must have a price'],
},
});
//tour model
const Tour = mongoose.model('Tour', tourSchema);
According to convetion first letter of a model name must be capitalized.
Let's create instance of our model that we created using mongoose and schema. also, interact with our database.
const testTour = new Tour({ // instance of our model
name: 'The Forest Hiker',
rating: 4.7,
price: 497,
});
// saving testTour document into database
testTour
.save()
.then((doc) => {
console.log(doc);
})
.catch((err) => {
console.log(err);
});
So having both schama and modle mongoose makes our life easier.
Think of Model as a wrapper to schemas. Schemas define the structure of your document , what kind of properties can you expect and what will be their data type (String,Number etc.). Models provide a kind of interface to perform CRUD on schema. See this post on FCC.
Schema basically models your data (where you provide datatypes for your fields) and can do some validations on your data. It mainly deals with the structure of your collection.
Whereas the model is a wrapper around your schema to provide you with CRUD methods on collections. It mainly deals with adding/querying the database.
Having both schema and model could appear redundant when compared to other frameworks like Django (which provides only a Model) or SQL (where we create only Schemas and write SQL queries and there is no concept of model). But, this is just the way Mongoose implements it.
I'm quite new to MongoDB and trying to build a nested comment system with it.
On the net you're finding various document structures to achieve that, but I'm looking for some proposals that would enable me easily to do the following things with the comments
Mark comments as spam/approved and retrieve comments by this attributes
Retrieve comments by user
Retrieve comment count for an object/user
Besides of course displaying the comments as it is normally done. If you have any suggestions on how to handle these things with MongoDB - or - tell me to look for an alternative it'd be appreciated much!
Have you considered storing the comments in all documents that need a reference to them? If you have a document for the user, store all of that user's comments in it. If you have a separate document for objects, store all comments there also. It feels sort of wrong after coming from a relational world where you try to have exactly one copy of a given piece of data, and then reference it by ID, but even with relational databases you have to start duplicating data if you want queries to run quickly.
With this design, each document that you load would be "complete". It would have all the data you need, and indexes on that collection would keep reads fast. The price would be slightly slower writes, and more of a headache when you need to update the comment text, since you need to update more than one document.
Because of you need retrieve comments by some attributes, by user, etc.., you can't embed(embedding is always faster for document databases) comment in each object that users can comment. So you need create separate collection for the comments. I suggest following structure:
comment
{
_id : ObjectId,
status: int (spam =1, approved =2),
userId: ObjectId,
commentedObjectId: ObjectId,
commentedObjectType: int(for example question =1, answer =2, user =3),
commentText
}
With above structure you can easy do things thats you want:
//Mark comments as spam/approved and retrieve comments by this attributes
//mark specific comment as spam
db.comments.update( { _id: someCommentId }, { status: 1 }, true);
db.comments.find({status : 1});// get all comments marked as spam
//Retrieve comments by user
db.comments.find({'_userId' : someUserId});
//Retrieve comment count for an object/user
db.comments.find({'commentedObjectId' : someId,'commentedObjectType' : 1 })
.count();
Also i suppose for comments counting will be better to create extra field in each object and inc it on comment add/delete.