MongoDB Schema Design suggestion - mongodb

I've used MongoDB for a while but i've only used it for doing CRUD operations when somebody else has already done the nitty-gritty task of designing a schema. So, basically this is the first time i'm designing a schema and i need some suggestions.
The data i will collect from users are their regular information, their health related information and their insurance related information. A single user will not have multiple health and insurance related information so it is a simple one-to-one relation. But these health and insurance related information will have lots of fields. So my question is. is it good to have a separate collection for health and insurance related information as :
var userSchema = {
name : String,
age : Number,
health_details : [{ type: Schema.Types.ObjectId, ref: 'Health' }],//reference to healthSchema
insurance_details : [{ type: Schema.Types.ObjectId, ref: 'Insurance' }] //reference to insuranceSchema
}
or to have a single collection with large number of fields as:
var userSchema = {
name : String,
age : Number,
disease_name : String, // and many other fields related to health
insurance_company_name : String //and many other fields related to insurance
}

Generally, some of the factors you can consider while modeling 1-to-1, 1-to-many and many-to-many data in NoSql are:
1. Data duplication
Do you expect data to duplicate? And that too not in a one word way like hobby "gardening", which many users can have and which probably doesn't need "hobbies" collection, but something like author and books. This case guarantees duplication.
An author can write many books. You should not be embedding author even in two books. It's hard to maintain when author info changes. Use 1-to-many. And reference can go in either of the two documents. As "has many" (array of bookIds in author) or "belongs to" (authorId in each book).
In case of health and insurance, as data duplication is not expected, single document is a better choice.
2. Read/write preference
What is the expected frequency of reads and writes of data (not collection)? For example, you query user, his health and insurance record much more frequently than updating it (and if 1 and 3 are not much of a problem) then this data should preferably be contained in and queried from a single document instead of three different sources.
Also, one document is what Mongodb guarantees atomicity for, which will be an added benefit if you want to update user, health and insurance all at the same time (say in one API).
3. Size of the document
Consider this: many users can like a post and a user can like many posts (many-to-many). And as you need to ensure no user likes a post twice, user ids must be stored somewhere. Three available options:
keep user ids array in post document
keep post ids array in user document
create another document that contains the ids of both (solution for many-to-many only, similar to SQL)
If a post is liked by more than a million users the post document will overflow with user references. Similarly, a user can like thousands of posts in a short period, so the second option is also not feasible. Which leaves us with the third option, which is the best for this case.
But a post can have many comments and a comment belongs to only one post (1-to-many). Now, comments you hardly expect more than a few hundreds. Rarely thousand. Therefore, keeping an array of commentIds (or embedded comments itself) in post is a practical solution.
In your case, I don't believe a document which does not keep a huge list of references can grow enough to reach 16 MB (Mongo document size limit). You can therefore safely store health and insurance data in user document. But they should have keys of their own like:
var userSchema = {
name : String,
age : Number,
health : {
disease_name : String,
//more health information
},
insurance :{
company_name : String,
//further insurance data
}
}
That's how you should think about designing your schema in my opinion. I would recommend reading these very helpful guides by Couchbase for data modeling: Document design considerations, modeling documents for retrieval and modeling relationships. Although related to Couchbase, the rules are equally applicable to mongodb schema design as both are NoSql and document oriented databases.

Related

How to avoid inconsistent embedded documents

Having a bit of trouble understanding when and why to use embedded documents in a mongo database.
Imagine we have three collections: users, rooms and bookings.
I have a few questions about a situation like this:
1) How would you update the embedded document? Would it be the responsibility of the application developer to find all instances of kevin as a embedded document and update it?
2) If the solution is to use document references, is that as heavy as a relational db join? Is this just a case of the example not being a good fit for Mongo?
As always let me know if I'm being a complete idiot.
Imho, you overdid it. Given the question from you use cases are
For a given reservation, what room is booked by which user?
For a given user, what are his or her details?
How many beds does a given room provide?
I would go with the following model for rooms
{
_id: 1001,
beds: 2
}
for users
{
_id: new ObjectId(),
username: "Kevin",
mobile:"12345678"
}
and for reservations
{
_id: new ObjectId(),
date: new ISODate(),
user: "Kevin",
room: 1001
}
Now in a reservation overview, you can have all relevant information ("who", "when" and "which") by simply querying reservations, without any overhead to answer the first question from you use cases. In a reservation details view, admittedly you would have to do two queries, but they are lightning fast with proper indexing and depending on your technology can be done asynchronously, too. Note that I saved an index by using the room number as id. How to answer the remaining questions should be obvious.
So as per your original question: embedding is not necessary here, imho.

Relational queries in MongoDB

Just started out with MongoDB. I have collections called users, dishes, restaurants and ratings. I need to map the ratings to a particular dish and user.
Users
{
_id: "12323421",
name: "John Doe",
...
}
Dishes
{
_id: "9872983749",
name: "Apple Pie",
restaurantID: "3432452" //Corresponds to Patisserie
...
}
Restaurants
{
_id: "3432452",
name: "Patisserie",
...
}
Ratings
{
_id: "74766575",
userID: "12323421", //Corresponds to John Doe
dishID: "9872983749", //Corresponds to Apple Pie
rating: 5
}
I dont know how to go about generating a few queries like:
List of dishes with at least 10 ratings, or
Restaurant whose dishes
have received 10 ratings
This is pretty simple to implement in an SQL environment, but how does one use Joins, or nested queries in MongoDB?
MongoDB does not natively support joins or subqueries.
I would suggest that you take a step back and do some reading on MongoDB schema design. The Data Modeling Concepts section of the MongoDB docs is a great place to start. There are many other resources out there on the topic. The O'Reilly book MongoDB Applied Design Patterns is also a great resource.
If you head down the path of modeling your data in MongoDB in a similar manner to how you would model it in an RDBMS, you are setting yourself up for failure.
There is not always a clear "right" or "best" way to model a particular problem. It will always depend on the specific access patterns and requirements for your application.
As you mentioned in a comment, one approach would be to embed the ratings for a particular dish into the Dish collection. But this is problematic if you have a large number of ratings (unbounded growth is bad). A common approach here is often a hybrid. For example, you could embed the most popular or the most recent ratings for a particular dish and store other ratings in a separate collection. Again, think about how your application is going to present the data and try to model your data accordingly.

Many to many relationship on Mongodb based e-learning webapp?

I am relatively new to No-SQL databases. I am designing a data structure for an e-learning web app. There would be X quantity of courses and Y quantity of users.
Every user will be able to take any number of courses.
Every course will be compound of many sections (each section may be a video or a quiz).
I will need to keep track of every section a user takes, so I think the whole course should be part of the user set (for each user), like so:
{
_id: "ed",
name: "Eduardo Ibarra",
courses: [
{
name: "Node JS",
progress: "100%",
section: [
{name: "Introdiction", passed:"100%", field3:"x", field4:""},
{name: "Quiz 1", passed:"75%", questions:[...], field3:"x", field4:""},
]
},
{
name: "MongoDB",
progress: "65%",
...
}
]
}
Is this the best way to do it?
I would say that design your database depending upon your queries. One thing is for sure.. You will have to do some embedding.
If you are going to perform more queries on what a user is doing, then make user as the primary entity and embed the courses within it. You don't need to embed the entire course info. The info about a course is static. For ex: the data about Node JS course - i.e. the content, author of the course, exercise files etc - will not change. So you can keep the courses' info separately in another collection. But how much of the course a user has completed is dependent on the individual user. So you should only keep the id of the course (which is stored in the separate 'course' collection) and for each user you can store the information that is related to that (User, Course) pair embedded in the user collection itself.
Now the most important question - what to do if you have to perform queries which require 'join' of user and course collections? For this you can use javascript to first get the courses (and maybe store them in an array or list etc) and then fetch the user for each of those courses from the courses collection or vice-versa. There are a few drivers available online to help you accomplish this. One is UnityJDBC which is available here.
From my experience, I understand that knowing what you are going to query from MongoDB is very helpful in designing your database because the NoSQL nature of MongoDB implies that you have no correct way for designing. Every way is incorrect if it does not allow you in accomplishing your task. So clearly, knowing beforehand what you will do (i.e. what you will query) with the database is the only guide.

MongoDb - Modeling storage of users & post in a webapp

I'm quite new to nosql world.
If I have a very simple webapp with users authenticating & publishing posts, what's the mongodb(nosql) way to store users & posts on the nosql db?
Do I have (like in relationnal databases) to store users & posts each one in his own collection? Or store them in the same collection, on different documents? Or, finally with a redondant user infos (credentials) on each post he has published?
A way you could do it is to use two collection, a posts collection and a authors collection. They could look like the following:
Posts
{
title: "Post title",
body: "Content of the post",
author: "author_id",
date: "...",
comments: [
{
name: "name of the commenter",
email: "...",
comment: "..."
}],
tags: [
"tag1", "tag2, "tag3
]
}
Authors
{
"_id": "author_id",
"password": "..."
}
Of course, you can put it in a single collection, but #jcrade mentioned a reason why you would/should use two collections. Remember, that's NoSQL. You should design your database from an application point of you, that means ask yourself what data is consumed and how.
This post says it all:
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
It really depends on your application, and how many posts you expect your users to have: if it's a one-to-few relationship, then probably using embedded documents (inside your users model) is the way to go. If it's one to many (up to a couple of thousands) then just embed an array of IDs in your users model. If it's more than that, then use the answer provided by Horizon_Net.
Read the post, and you get a pretty good idea of what you will have to do. Good luck!
When you are modeling nosql database you should think in 3 basic ideas
Desnormalization
Copy same data on multiple documents. in order to simplify/optimize query processing or to fit the user’s data into a particular data model
Aggregation
Embed data into documents for example (blog post and coments) in order to impact updates both in performance and consistency because mongo has one document consistency at time
Application level Joins
Create applicaciton level joins when its not good idea to agregate information (for example each post as idependent document will be really bad because we need to accces to the same resource)
to answer your question
Create two document one is blogPost with all the comments, and tags on it and user ui. Second User with all user information.

How do document databases deal with changing relationships between objects (or do they at all)?

Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.
Since I'm using a document database (such as MongoDB), my structure might look something like this:
+ Customers[]
+--Customer
+--Employees[]
+--Employee
+--Employee
+--Customer
+--Employees[]
+--Employee
What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?
How does one manage this kind of change in a document database?
Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?
In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).
If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.
In that case, am I still using the document database as I should be?
Isn't it becoming more like a relational database?
Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.
Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.
If you have Employees who can work at one or more Companies, I would structure it as:
// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }
// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }
You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.
Doesn't the simplicity of a document
database become your worse enemy,
since it creates brittle data
structures which can't easily be
modified?
The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.
My recent answer for this question covers this in the RavenDb context:
How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?