I am going to make a student management system using MongoDB. I will have one table for students and another for attendance records. Can I have a key in the attendance table to reach the students table, as pictured below? How?
The idea behind MongoDB is to eliminate (or at least minimize) relational data. Have you considered just embedding the attendance data directly into each student record? This is actually the preferred design pattern for MongoDB and can result in much better performance and scalability.
If you truly need highly relational and normalized data, you might want to reconsider using MongoDB.
The answer depends on how you intend to use the data. You really have 2 options, embed the attendance table, or link it. More on these approaches is detailed here: http://www.mongodb.org/display/DOCS/Schema+Design
For the common use-case, you would probably embed this particular collection, so each student record would have an embedded "attendance" table. This would work because attendance records are unlikely to be shared between students, and retrieving the attendance data is likely to require the student information as well. Retrieving the attendance data would be as simple as:
db.student.find( { login : "sean" } )
{
login : "sean",
first : "Sean",
last : "Hodges",
attendance : [
{ class : "Maths", when : Date("2011-09-19T04:00:10.112Z") },
{ class : "Science", when : Date("2011-09-20T14:36:06.958Z") }
]
}
Yes. There are no hard and fast rules. You have to look at the pros and cons of either embedding or referencing data. This video will definitely help (https://www.youtube.com/watch?v=-o_VGpJP-Q0&t=21s). In your example, the phone number attribute should be on the same table (in a document database), because the phone number of a person rarely changes.
Related
Having a bit of trouble understanding when and why to use embedded documents in a mongo database.
Imagine we have three collections: users, rooms and bookings.
I have a few questions about a situation like this:
1) How would you update the embedded document? Would it be the responsibility of the application developer to find all instances of kevin as a embedded document and update it?
2) If the solution is to use document references, is that as heavy as a relational db join? Is this just a case of the example not being a good fit for Mongo?
As always let me know if I'm being a complete idiot.
Imho, you overdid it. Given the question from you use cases are
For a given reservation, what room is booked by which user?
For a given user, what are his or her details?
How many beds does a given room provide?
I would go with the following model for rooms
{
_id: 1001,
beds: 2
}
for users
{
_id: new ObjectId(),
username: "Kevin",
mobile:"12345678"
}
and for reservations
{
_id: new ObjectId(),
date: new ISODate(),
user: "Kevin",
room: 1001
}
Now in a reservation overview, you can have all relevant information ("who", "when" and "which") by simply querying reservations, without any overhead to answer the first question from you use cases. In a reservation details view, admittedly you would have to do two queries, but they are lightning fast with proper indexing and depending on your technology can be done asynchronously, too. Note that I saved an index by using the room number as id. How to answer the remaining questions should be obvious.
So as per your original question: embedding is not necessary here, imho.
I've used MongoDB for a while but i've only used it for doing CRUD operations when somebody else has already done the nitty-gritty task of designing a schema. So, basically this is the first time i'm designing a schema and i need some suggestions.
The data i will collect from users are their regular information, their health related information and their insurance related information. A single user will not have multiple health and insurance related information so it is a simple one-to-one relation. But these health and insurance related information will have lots of fields. So my question is. is it good to have a separate collection for health and insurance related information as :
var userSchema = {
name : String,
age : Number,
health_details : [{ type: Schema.Types.ObjectId, ref: 'Health' }],//reference to healthSchema
insurance_details : [{ type: Schema.Types.ObjectId, ref: 'Insurance' }] //reference to insuranceSchema
}
or to have a single collection with large number of fields as:
var userSchema = {
name : String,
age : Number,
disease_name : String, // and many other fields related to health
insurance_company_name : String //and many other fields related to insurance
}
Generally, some of the factors you can consider while modeling 1-to-1, 1-to-many and many-to-many data in NoSql are:
1. Data duplication
Do you expect data to duplicate? And that too not in a one word way like hobby "gardening", which many users can have and which probably doesn't need "hobbies" collection, but something like author and books. This case guarantees duplication.
An author can write many books. You should not be embedding author even in two books. It's hard to maintain when author info changes. Use 1-to-many. And reference can go in either of the two documents. As "has many" (array of bookIds in author) or "belongs to" (authorId in each book).
In case of health and insurance, as data duplication is not expected, single document is a better choice.
2. Read/write preference
What is the expected frequency of reads and writes of data (not collection)? For example, you query user, his health and insurance record much more frequently than updating it (and if 1 and 3 are not much of a problem) then this data should preferably be contained in and queried from a single document instead of three different sources.
Also, one document is what Mongodb guarantees atomicity for, which will be an added benefit if you want to update user, health and insurance all at the same time (say in one API).
3. Size of the document
Consider this: many users can like a post and a user can like many posts (many-to-many). And as you need to ensure no user likes a post twice, user ids must be stored somewhere. Three available options:
keep user ids array in post document
keep post ids array in user document
create another document that contains the ids of both (solution for many-to-many only, similar to SQL)
If a post is liked by more than a million users the post document will overflow with user references. Similarly, a user can like thousands of posts in a short period, so the second option is also not feasible. Which leaves us with the third option, which is the best for this case.
But a post can have many comments and a comment belongs to only one post (1-to-many). Now, comments you hardly expect more than a few hundreds. Rarely thousand. Therefore, keeping an array of commentIds (or embedded comments itself) in post is a practical solution.
In your case, I don't believe a document which does not keep a huge list of references can grow enough to reach 16 MB (Mongo document size limit). You can therefore safely store health and insurance data in user document. But they should have keys of their own like:
var userSchema = {
name : String,
age : Number,
health : {
disease_name : String,
//more health information
},
insurance :{
company_name : String,
//further insurance data
}
}
That's how you should think about designing your schema in my opinion. I would recommend reading these very helpful guides by Couchbase for data modeling: Document design considerations, modeling documents for retrieval and modeling relationships. Although related to Couchbase, the rules are equally applicable to mongodb schema design as both are NoSql and document oriented databases.
I am relatively new to No-SQL databases. I am designing a data structure for an e-learning web app. There would be X quantity of courses and Y quantity of users.
Every user will be able to take any number of courses.
Every course will be compound of many sections (each section may be a video or a quiz).
I will need to keep track of every section a user takes, so I think the whole course should be part of the user set (for each user), like so:
{
_id: "ed",
name: "Eduardo Ibarra",
courses: [
{
name: "Node JS",
progress: "100%",
section: [
{name: "Introdiction", passed:"100%", field3:"x", field4:""},
{name: "Quiz 1", passed:"75%", questions:[...], field3:"x", field4:""},
]
},
{
name: "MongoDB",
progress: "65%",
...
}
]
}
Is this the best way to do it?
I would say that design your database depending upon your queries. One thing is for sure.. You will have to do some embedding.
If you are going to perform more queries on what a user is doing, then make user as the primary entity and embed the courses within it. You don't need to embed the entire course info. The info about a course is static. For ex: the data about Node JS course - i.e. the content, author of the course, exercise files etc - will not change. So you can keep the courses' info separately in another collection. But how much of the course a user has completed is dependent on the individual user. So you should only keep the id of the course (which is stored in the separate 'course' collection) and for each user you can store the information that is related to that (User, Course) pair embedded in the user collection itself.
Now the most important question - what to do if you have to perform queries which require 'join' of user and course collections? For this you can use javascript to first get the courses (and maybe store them in an array or list etc) and then fetch the user for each of those courses from the courses collection or vice-versa. There are a few drivers available online to help you accomplish this. One is UnityJDBC which is available here.
From my experience, I understand that knowing what you are going to query from MongoDB is very helpful in designing your database because the NoSQL nature of MongoDB implies that you have no correct way for designing. Every way is incorrect if it does not allow you in accomplishing your task. So clearly, knowing beforehand what you will do (i.e. what you will query) with the database is the only guide.
Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.
Since I'm using a document database (such as MongoDB), my structure might look something like this:
+ Customers[]
+--Customer
+--Employees[]
+--Employee
+--Employee
+--Customer
+--Employees[]
+--Employee
What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?
How does one manage this kind of change in a document database?
Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?
In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).
If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.
In that case, am I still using the document database as I should be?
Isn't it becoming more like a relational database?
Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.
Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.
If you have Employees who can work at one or more Companies, I would structure it as:
// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }
// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }
You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.
Doesn't the simplicity of a document
database become your worse enemy,
since it creates brittle data
structures which can't easily be
modified?
The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.
My recent answer for this question covers this in the RavenDb context:
How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?
I'm trying to figure out how to best design Mongo DB schemas. The Mongo DB documentation recommends relying heavily on embedded documents for improved querying, but I'm wondering if my use case actually justifies referenced documents.
A very basic version of my current schema is basically:
(Apologies for the psuedo-format, I'm not sure how to express Mongo schemas)
users {
email (string)
}
games {
user (reference user document)
date_started (timestamp)
date_finished (timestamp)
mode (string)
score: {
total_points (integer)
time_elapsed (integer)
}
}
Games are short (about 60 seconds long) and I expect a lot of concurrent writes to be taking place.
At some point, I'm going to want to calculate a high score list, and possibly in a segregated fashion (e.g., high score list for a particular game.mode or date)
Is embedded documents the best approach here? Or is this truly a problem that relations solves better? How would these use cases best be solved in Mongo DB?
... is this truly a problem that relations solves better?
The key here is less about "is this a relation?" and more about "how am I going to access this?"
MongoDB is not "anti-reference". MongoDB does not have the benefits of joins, but it does have the benefit of embedded documents.
As long as you understand these trade-offs then it's perfectly fair to use references in MongoDB. It's really about how you plan to query these objects.
Is embedded documents the best approach here?
Maybe. Some things to consider.
Do games have value outside of the context of the user?
How many games will a single user have?
Is games transactional in nature?
How are you going to access games? Do you always need all of a user's games?
If you're planning to build leaderboards and a user can generate hundreds of game documents, then it's probably fair to have games in their own collection. Storing ten thousand instances of "game" inside of each users isn't particularly useful.
But depending on your answers to the above, you could really go either way. As the litmus test, I would try running some Map / Reduce jobs (i.e. build a simple leaderboard) to see how you feel about the structure of your data.
Why would you use a relation here? If the 'email' is the only user property than denormalization and using an embedded document would be perfectly fine. If the user object contains other information I would go for a reference.
I think that you should to use "entity-object" and "object-value" definitions from DDD. For entity use reference,but for "object-value" use embed document.
Also you can use denormalization of your object. i mean that you can duplicate your data. e.g.
// root document
game
{
//duplicate part that you need of root user
user: { FirstName: "Some name", Id: "some ID"}
}
// root document
user
{
Id:"ID",
FirstName:"someName",
LastName:"last name",
...
}