How do document databases deal with changing relationships between objects (or do they at all)? - mongodb

Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.
Since I'm using a document database (such as MongoDB), my structure might look something like this:
+ Customers[]
+--Customer
+--Employees[]
+--Employee
+--Employee
+--Customer
+--Employees[]
+--Employee
What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?
How does one manage this kind of change in a document database?
Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?
In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).
If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.
In that case, am I still using the document database as I should be?
Isn't it becoming more like a relational database?

Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.
Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.
If you have Employees who can work at one or more Companies, I would structure it as:
// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }
// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }
You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.
Doesn't the simplicity of a document
database become your worse enemy,
since it creates brittle data
structures which can't easily be
modified?
The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.

My recent answer for this question covers this in the RavenDb context:
How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?

Related

MongoDB Schema Design suggestion

I've used MongoDB for a while but i've only used it for doing CRUD operations when somebody else has already done the nitty-gritty task of designing a schema. So, basically this is the first time i'm designing a schema and i need some suggestions.
The data i will collect from users are their regular information, their health related information and their insurance related information. A single user will not have multiple health and insurance related information so it is a simple one-to-one relation. But these health and insurance related information will have lots of fields. So my question is. is it good to have a separate collection for health and insurance related information as :
var userSchema = {
name : String,
age : Number,
health_details : [{ type: Schema.Types.ObjectId, ref: 'Health' }],//reference to healthSchema
insurance_details : [{ type: Schema.Types.ObjectId, ref: 'Insurance' }] //reference to insuranceSchema
}
or to have a single collection with large number of fields as:
var userSchema = {
name : String,
age : Number,
disease_name : String, // and many other fields related to health
insurance_company_name : String //and many other fields related to insurance
}
Generally, some of the factors you can consider while modeling 1-to-1, 1-to-many and many-to-many data in NoSql are:
1. Data duplication
Do you expect data to duplicate? And that too not in a one word way like hobby "gardening", which many users can have and which probably doesn't need "hobbies" collection, but something like author and books. This case guarantees duplication.
An author can write many books. You should not be embedding author even in two books. It's hard to maintain when author info changes. Use 1-to-many. And reference can go in either of the two documents. As "has many" (array of bookIds in author) or "belongs to" (authorId in each book).
In case of health and insurance, as data duplication is not expected, single document is a better choice.
2. Read/write preference
What is the expected frequency of reads and writes of data (not collection)? For example, you query user, his health and insurance record much more frequently than updating it (and if 1 and 3 are not much of a problem) then this data should preferably be contained in and queried from a single document instead of three different sources.
Also, one document is what Mongodb guarantees atomicity for, which will be an added benefit if you want to update user, health and insurance all at the same time (say in one API).
3. Size of the document
Consider this: many users can like a post and a user can like many posts (many-to-many). And as you need to ensure no user likes a post twice, user ids must be stored somewhere. Three available options:
keep user ids array in post document
keep post ids array in user document
create another document that contains the ids of both (solution for many-to-many only, similar to SQL)
If a post is liked by more than a million users the post document will overflow with user references. Similarly, a user can like thousands of posts in a short period, so the second option is also not feasible. Which leaves us with the third option, which is the best for this case.
But a post can have many comments and a comment belongs to only one post (1-to-many). Now, comments you hardly expect more than a few hundreds. Rarely thousand. Therefore, keeping an array of commentIds (or embedded comments itself) in post is a practical solution.
In your case, I don't believe a document which does not keep a huge list of references can grow enough to reach 16 MB (Mongo document size limit). You can therefore safely store health and insurance data in user document. But they should have keys of their own like:
var userSchema = {
name : String,
age : Number,
health : {
disease_name : String,
//more health information
},
insurance :{
company_name : String,
//further insurance data
}
}
That's how you should think about designing your schema in my opinion. I would recommend reading these very helpful guides by Couchbase for data modeling: Document design considerations, modeling documents for retrieval and modeling relationships. Although related to Couchbase, the rules are equally applicable to mongodb schema design as both are NoSql and document oriented databases.

Many to many relationship on Mongodb based e-learning webapp?

I am relatively new to No-SQL databases. I am designing a data structure for an e-learning web app. There would be X quantity of courses and Y quantity of users.
Every user will be able to take any number of courses.
Every course will be compound of many sections (each section may be a video or a quiz).
I will need to keep track of every section a user takes, so I think the whole course should be part of the user set (for each user), like so:
{
_id: "ed",
name: "Eduardo Ibarra",
courses: [
{
name: "Node JS",
progress: "100%",
section: [
{name: "Introdiction", passed:"100%", field3:"x", field4:""},
{name: "Quiz 1", passed:"75%", questions:[...], field3:"x", field4:""},
]
},
{
name: "MongoDB",
progress: "65%",
...
}
]
}
Is this the best way to do it?
I would say that design your database depending upon your queries. One thing is for sure.. You will have to do some embedding.
If you are going to perform more queries on what a user is doing, then make user as the primary entity and embed the courses within it. You don't need to embed the entire course info. The info about a course is static. For ex: the data about Node JS course - i.e. the content, author of the course, exercise files etc - will not change. So you can keep the courses' info separately in another collection. But how much of the course a user has completed is dependent on the individual user. So you should only keep the id of the course (which is stored in the separate 'course' collection) and for each user you can store the information that is related to that (User, Course) pair embedded in the user collection itself.
Now the most important question - what to do if you have to perform queries which require 'join' of user and course collections? For this you can use javascript to first get the courses (and maybe store them in an array or list etc) and then fetch the user for each of those courses from the courses collection or vice-versa. There are a few drivers available online to help you accomplish this. One is UnityJDBC which is available here.
From my experience, I understand that knowing what you are going to query from MongoDB is very helpful in designing your database because the NoSQL nature of MongoDB implies that you have no correct way for designing. Every way is incorrect if it does not allow you in accomplishing your task. So clearly, knowing beforehand what you will do (i.e. what you will query) with the database is the only guide.

Links vs References in Document databases

I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.
The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.
MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.

Can one make a relational database using MongoDB?

I am going to make a student management system using MongoDB. I will have one table for students and another for attendance records. Can I have a key in the attendance table to reach the students table, as pictured below? How?
The idea behind MongoDB is to eliminate (or at least minimize) relational data. Have you considered just embedding the attendance data directly into each student record? This is actually the preferred design pattern for MongoDB and can result in much better performance and scalability.
If you truly need highly relational and normalized data, you might want to reconsider using MongoDB.
The answer depends on how you intend to use the data. You really have 2 options, embed the attendance table, or link it. More on these approaches is detailed here: http://www.mongodb.org/display/DOCS/Schema+Design
For the common use-case, you would probably embed this particular collection, so each student record would have an embedded "attendance" table. This would work because attendance records are unlikely to be shared between students, and retrieving the attendance data is likely to require the student information as well. Retrieving the attendance data would be as simple as:
db.student.find( { login : "sean" } )
{
login : "sean",
first : "Sean",
last : "Hodges",
attendance : [
{ class : "Maths", when : Date("2011-09-19T04:00:10.112Z") },
{ class : "Science", when : Date("2011-09-20T14:36:06.958Z") }
]
}
Yes. There are no hard and fast rules. You have to look at the pros and cons of either embedding or referencing data. This video will definitely help (https://www.youtube.com/watch?v=-o_VGpJP-Q0&t=21s). In your example, the phone number attribute should be on the same table (in a document database), because the phone number of a person rarely changes.

When to embed documents in Mongo DB

I'm trying to figure out how to best design Mongo DB schemas. The Mongo DB documentation recommends relying heavily on embedded documents for improved querying, but I'm wondering if my use case actually justifies referenced documents.
A very basic version of my current schema is basically:
(Apologies for the psuedo-format, I'm not sure how to express Mongo schemas)
users {
email (string)
}
games {
user (reference user document)
date_started (timestamp)
date_finished (timestamp)
mode (string)
score: {
total_points (integer)
time_elapsed (integer)
}
}
Games are short (about 60 seconds long) and I expect a lot of concurrent writes to be taking place.
At some point, I'm going to want to calculate a high score list, and possibly in a segregated fashion (e.g., high score list for a particular game.mode or date)
Is embedded documents the best approach here? Or is this truly a problem that relations solves better? How would these use cases best be solved in Mongo DB?
... is this truly a problem that relations solves better?
The key here is less about "is this a relation?" and more about "how am I going to access this?"
MongoDB is not "anti-reference". MongoDB does not have the benefits of joins, but it does have the benefit of embedded documents.
As long as you understand these trade-offs then it's perfectly fair to use references in MongoDB. It's really about how you plan to query these objects.
Is embedded documents the best approach here?
Maybe. Some things to consider.
Do games have value outside of the context of the user?
How many games will a single user have?
Is games transactional in nature?
How are you going to access games? Do you always need all of a user's games?
If you're planning to build leaderboards and a user can generate hundreds of game documents, then it's probably fair to have games in their own collection. Storing ten thousand instances of "game" inside of each users isn't particularly useful.
But depending on your answers to the above, you could really go either way. As the litmus test, I would try running some Map / Reduce jobs (i.e. build a simple leaderboard) to see how you feel about the structure of your data.
Why would you use a relation here? If the 'email' is the only user property than denormalization and using an embedded document would be perfectly fine. If the user object contains other information I would go for a reference.
I think that you should to use "entity-object" and "object-value" definitions from DDD. For entity use reference,but for "object-value" use embed document.
Also you can use denormalization of your object. i mean that you can duplicate your data. e.g.
// root document
game
{
//duplicate part that you need of root user
user: { FirstName: "Some name", Id: "some ID"}
}
// root document
user
{
Id:"ID",
FirstName:"someName",
LastName:"last name",
...
}