I used to use MySQL and now new to Mongodb. In phase of learning, I am trying to create a User Profile like LinkedIn. I am confused to choose the proper way of creating one-to-many relationship in mongodb schemas.
Context: A user can have multiple education qualification and same with experience. Here are two ways, that i tried to create the schema as:
Example 1:
var userSchema = new mongoose.Schema({
name: {
first: String,
middle: String,
last: String,
},
gender: { type: Number, max: 3, min: 0, default: GENDERS.Unspecified },
age: Number,
education: [{type: Schema.Types.ObjectId, ref:'Education'}],
experience: [{type: Schema.Types.ObjectId, ref:'Experience'}],
email: String
});
Example 2:
var userSchema = new mongoose.Schema({
name: {
first: String,
middle: String,
last: String,
},
gender: { type: Number, max: 3, min: 0, default: GENDERS.Unspecified },
age: Number,
education: [{type: String, detail: {
major: String,
degree: String,
grade: String,
startdate: Date,
enddate: Date,
remarks: String
}}],
experience: [{type: String, detail: {
position: String,
company: String,
dateofjoin: String,
dateofretire: String,
location: String,
responsibilities: [{type:String}],
description: String,
}}],
email: String
});
I haven't tested the second option.
Which one is better and easier during writing querying to fetch or add data? Is there a better way to write schema for scenarios like that?
For this particular example I would suggest embedding the documents vs storing as reference, and here is why.
Usually my first step in mongodb schema design is to consider how I will be querying the data?
Will I want to know the educational background of a user, or will I want to know all users who have a particular educational background? Having answers to these types of questions will determine how you want to setup your database.
If you already know your user, having the embedded educational background within that document is quick and easy to access. No separate queries are needed, and the schema still isn't overly complex and difficult to comprehend. This is how I would expect you would want to access the data.
Now, if you want to find all users with a particular educational background, there are obviously large drawbacks to the embedded document schema. In that case you would need to first look at each user, then loop through the education array to see all the various educations.
So the answer will be dependent on the expected queries you will be making to the data, but in this case (unless you are doing reporting on all users) then embedded is probably the way to go.
You may also find this discussion helpful:
MongoDB relationships: embed or reference?
Related
Suppose I have the following schema:
const personSchema = mongoose.Schema({
firstname: String,
lastname: String,
email: String,
gender: {type: String, enum: ["Male", "Female"]}
dob: Date,
city: String,
interests: [interestsSchema],
// vs this
// interests: [{type: Schema.Types.ObjectId(), ref: 'Interest'}]
});
What are the difference between the two methods here? What are the advantages and disadvantages of one vs the other?
There is obviously no perfect answer to this question, it all depends on the scenario, where and how will you use the data.
ObjectId vs Embedded
I use both embedded documents and lists of ObjectIds. For example, I have two collections, Customers and Contacts. Every Customer can have multiple contacts, and multiple Customers should be able to use the same contact.
We also want to list and manage contacts separately.
We also have an array of addresses, which is unique to each Customer document, we do not need to list or manage them separately or reuse them ever again, so it makes sense to embed them.
Customer
{
businessName: string,
contacts: [ObjectId, ObjectId],
address: {
invoice: [{
street: string
zip: string
city: string
}]
}
}
Contact
{
firstName: string
lastName: string
phoneNumber: string
email: string
}
Array size
This is just a side note, not relevant to the question, just some more information about arrays in MongoDB.
What you are describing here is a so called one-to-many relationship, where one document may have any number of embedded documents.
MongoDB works best with "one-to-few" when using any type of array, they usually recommend you to not have an array containing more than a hundred documents, if there are more, it will start to impact query times in one way or another.
The same goes for the ObjectId array, it should not be more than a hundred in the array.
If you expect the array to increase beyond a hundred, you should use a separate collection with a property like personId where every interest refers to the person. This way, you don't need the array at all in the personSchema.
database noob here using MongoDB, in my program, I have users, and the core of my program are these roadmaps that I display. So, each user can create roadmaps, save others roadmaps, blah blah... Each user has a field named savedRoadmaps and createdRoadmaps which should store the roadmaps. My question is, should I just store the roadmap _ids in the savedRoadmap and createdRoadmaps field or the entire roadmap?
I am asking this because it feels like saving just the _id of the roadmaps can save storage, but it might not come in handy when I have to fetch the data of the user first, then fetch the roadmap using the roadmap ID in the user's savedRoadmap/createdRoadmap field, versus just fetching the user and the savedRoadmap field will already have the roadmap in there.
And btw, is there any sweet and brief database design read out there, please direct me to some if you know any!
For a user, I want it to have a name, email, password, description ofcourse, and also savedRoadmaps and createdRoadmaps. A user can create unlimited roadmaps and also save as much as he or she wants. For a roadmap, I want it to have a name, category, time_completion, author, date, and a roadmap object which will contain the actual json string that I will use d3 to display. Here's my User and Roadmap Schema right now:
const RoadmapSchema = new Schema({
author: {
type: String,
require: false
},
name: {
type: String,
require: true
},
category: {
type: String,
require: true
},
time_completion: {
type: Number,
require: true
},
date: {
type: Date,
default: Date.now
},
roadmap: {
type: "object",
require: true
}
});
and User Schema:
const UserSchema = new Schema({
name: {
type: String,
required: true
},
email: {
type: String,
required: true
},
password: {
type: String,
required: true
},
date: {
type: Date,
default: Date.now
},
savedRoadmap: {
type: "object",
default: []
},
createdRoadmap: {
type: "object",
default: []
}
});
My question is, inside of the savedRoadmap and createdRoadmap fields of the User schema, should I include just the _id of a roadmap, or should I include the entire json string which represents the roadmap?
There are 3 different data-modeling techniques you can use to design your roadmaps system based on the cardinality of the relationship between users and roadmaps.
In general you need to de-normalize your data model based on the queries that are expected from your application:
One to Few: Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
One to Many: Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
One-to-Squillions: Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions
And btw, is there any sweet and brief database design read out there,
please direct me to some if you know any!
Rules of Thumb for MongoDB Schema Design: Part 1
I am trying to implement a collection in meteor/mongo which is of following nature:
FIRST_NAME-------LAST_NAME-------------CLASSES----------PROFESSORS
----------A-----------------------B------------------------------a---------------------b
-------------------------------------------------------------------c---------------------d
-------------------------------------------------------------------e---------------------f
-------------------------------------------------------------------g---------------------h
-------------M-------------------------N------------------------c---------------------d
-------------------------------------------------------------------p---------------------q
-------------------------------------------------------------------x---------------------q
-------------------------------------------------------------------m---------------------n
-------------------------------------------------------------------r---------------------d
So as above, a person can take multiple classes and a class can have multiple people. Now, I want to make this collection searchable and sortable by all possible fields. (Also that one professor can teach multiple classes.)
Searching by FIRST_NAME and LAST_NAME is easy in above shown model. But, I should be able to see all student depending on the class I select. I would also want to see list of classes sorted in alphabetical order and also the people enrolled in corresponding classes?
Can you please let me know how to approach this in a meteor/mongo style? I would also be glad if you could lead me to any resources available on this?
You are describing one of the typical data structures which are better suited for a relational database. But don't worry. For reasonably sized data sets it is quite workable in MongoDB too.
When modelling this type of structure in a document database you use embedding, which does lead to data duplication, but this data duplication is typically not a problem.
Pseudo-code for your model:
Collection schoolClass: { // Avoid the reserved word "class"
_id: string,
name: string,
students: [ { _id: string, firstName: string, lastName: string } ],
professor: { _id: string, firstName: string, lastName: string }
}
Collection student: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
Collection professor: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
This gives you easily searchable/sortable entry points to all objects. You only follow the "relation" _id to the next collection if you need some special data from an object. All data needed for all documents in the common queries should be present in the Collection the query is run on.
You just need to make sure you update all the relevant collections when an object changes.
A good read is https://docs.mongodb.com/manual/core/data-modeling-introduction/
I am new to mongoose as well as nosql. I am designing a database which will contain a list of people and each person could have multiple skills - like C, Java, Python. Further the person would have been using the particular skill since a particular time - eg. Since 2010.
I have created a personSchema and a skillSchema. I am not able to figure how to add the "Since" as the since is specific to a person but is also for a particular skill.
I really need the skill to be a separate schema as the list of skills would be used elsewhere.
let personSchema = new mongoose.Schema({
id: { type: String, required: true, unique: true, index: true, dropDups: true},
firstname: String,
lastname: String,
age: Number
mobile: [Number],
skills: [{type: Schema.Types.ObjectId, ref: 'Skill'}]
});
let skillSchema = new mongoose.Schema({
skillName: String
});
Now where to store "since"?
E.g Tom is working on C++ since 2010 - The 2010 is related to both Tom and C++
skills : [
{
skill : {type: Schema.Types.ObjectId, ref: 'Skill'}
since : Number
}]
Adding 'Since' this way will make more sense as each skill reference will have its since value with it.
Hope it helps.
I took the NoSQL plunge against all my RDBMS prejudices from my past. But I trusted. Now I find myself 3 months into a project and the exact reasons we adhered to RDMS principles seem to be biting me in the butt. I think I just discovered here on stackoverflow that I can't work with twice embedded arrays. I followed the noSQL, embedded document approach like a good kool-aid drinker and feel like I've been betrayed. Before I swear off noSQL and go back and refactor my entire code-base to adhere to new 'normalized' model I'd like to here from some no-sql champions.
Here is my model using one big document with embedded docs and the works:
var mongoose = require('mongoose'),
Schema = mongoose.Schema,
User = mongoose.model('User');
var Entry = new Schema({
text: String,
ups: Number,
downs: Number,
rankScore: Number,
posted: {
type: Date,
default: Date.now
},
postedBy: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}
});
var boardSchema = new Schema({
theme: String,
created: {
type: Date,
default: Date.now
},
owner: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
},
entered: {
type: Boolean,
default: false
},
entries: [Entry],
participants: [{
user: { type: mongoose.Schema.Types.ObjectId, ref: 'User'},
date: { type: Date, default: Date.now },
topTen: [ { type: mongoose.Schema.Types.ObjectId, ref: 'Entry'} ]
}]
});
mongoose.model('Board', boardSchema);
Basically, I want to query the document by Board._id, then where participants.user == req.user.id, I'd like to add to the topTen[] array. Note participants[] is an array within the document and topTen is an array within participants[]. I've found other similar questions but I was pointed to a Jira item which doesn't look like it will be implemented to allow the use of $ positional operation in multiple embedded arrays. Is there no way to do this now? Or if anyone has a suggestion of how to model my document so that I don't have to go full re-write with a new normalized reference model...please help!
Here are some of my query attempts from what I could find online. Nothing worked for me.
Board.update({_id: ObjectId('56910eed15c4d50e0998a2c9'), 'participants.user._id': ObjectId('56437f6a142974240273d862')}, {$set:{'participants.0.topTen.$.entry': ObjectId('5692eafc64601ceb0b64269b') }}
I read you should avoid such 'nested' designs but with the embedded model its hard not to. Basically this statement says to me "don't embed" go "ref".