I just start learning MongoDB this week. Prior to it, I am only familiar with SQL, so it takes some times for me to convert my logic.
I have a simple question here about a one-to-many relationship.
Assume that I have a book for rental, and I want to record the rental history. Which schema I should use?
Plan A: Create a document, use BookId as reference, and hold all the rental history.
Plan B: Create multiple documents and use BookId as reference. Every time when the book is rented out, I will create a document for it.
Which plan is better? I guess Plan B is better but I am just not sure about it.
http://docs.mongodb.org/manual/core/data-modeling/ has good advice on how to structure your collections.
Either of your schemes could work BUT if there are a very large number of rentals per book you MUST use plan B because a document has a maximum size.
You can also mix plan A and B, for example, using plan B but also storing a limited array of recent rentals with the book so that you can satisfy the initial query with one trip to the database but if the user then scrolls down to see more you start fetching the rental records.
You can often start by thinking of it like you would in SQL but then adding some denormalization when you need to improve performance.
Related
Assuming I have data of high school students across the country. Each high school data are not related each other and also never needed to be related to each other (compartmentalized). Which one is recommended if I use mongoDB:
1) Create single collection inheritance with the following attributes:
high_school_id, student_id, name, address
2) Create multiple collections (possibly thousands) with the following attributes:
student_id, name, address
The name of collection will follow school_data_<X> format, where X is the high_school_id. So, to query, my program can dynamically construct the collection name.
I came from MySQL, PostgreSQL background where having thousands tables are not common (So, option (1) is far more makes sense). How is it in MongoDB?
I recommend you use the first option, because MongoDB has a limit on the number of collections. More about this read docs.
You may want to consider a third option: create a collection with the students, where each student's record will include a high school data. There is nothing wrong in the duplication of data, you should not thinking about this in MongoDB, but you should thinking about more convenient way working with data.
I am new to MongoDB and I have difficulties implementing a solution in it.
Consider a case where I have two collections: a client and sales collection with such designs
Client
==========
id
full name
mobile
gender
region
emp_status
occupation
religion
Sales
===========
id
client_id //this would be a DBRef
trans_date //date time value
products //an array of collections of product sold in the form {product_code, description, units, unit price, amount}
total sales
Now there is a requirement to develop another collection for analytical queries where the following questions can be answered
What are the distribution of sales by gender, region and emp_status?
What are the mostly purchase products for clients in a particular region?
I considered implementing a very denormalized collection to create a flat and wide collection of the properties of the sales and client collection so that I can use map-reduce to further answer the questions.
In RDBMS, an aggregation back by a join would answer these question but I am at loss to how to make Map-Reduce or Agregation help out.
Questions:
How do I implement Map-Reduce to map across 2 collections?
Is it possible to chain MapReduce operations?
Regards.
MongoDB does not do JOINs - period!
MapReduce always runs on a single collection. You can not have a single MapReduce job which selects from more than one collection. The same applies to aggregation.
When you want to do some data-mining (not MongoDBs strongest suit), you could create a denormalized collection of all Sales with the corresponding Client object embedded. You will have to write a little program or script which iterates over all clients and
finds all Sales documents for the clinet
merges the relevant fields from Client into each document
inserts the resulting document into the new collection
When your Client document is small and doesn't change often, you might consider to always embed it into each Sales. This means that you will have redundant data, which looks very evil from the viewpoint of a seasoned RDB veteran. But remember that MongoDB is not a relational database, so you should not apply all RDBMS dogmas unreflected. The "no redundancy" rule of database normalization is only practicable when JOINs are relatively inexpensive and painless, which isn't the case with MongoDB. Besides, sometimes you might want redundancy to ensure data persistence. When you want to know your historical development of sales by region, you want to know the region where the customer resided when they bought the product, not where they reside now. When each Sale only references the current Client document, that information is lost. Sure, you can solve this with separate Address documents which have date-ranges, but that would make it even more complicated.
Another option would be to embed an array of Sales in each Client. However, MongoDB doesn't like documents which grow over time, so when your clients tend to return often, this might result in sub-par write-performance.
Assuming the following "schema/relationship" design what is the recommended practice for handling deletion with cascade delete like operation?
Relational Schema:
+---------+ +--------+
| Student |-*--------1-[Enrollment]-1--------*-| Course |
+---------+ +--------+
MongoDB:
+---------+ +--------+
| Student |-*----------------*-| Course |
+---------+ +--------+
Given this classic design of enrollment of students to courses, having a collection of courses in students and vice versa seems to be an appropriate data model when using MongoDB (that is nothing for the relationship/enrollment table). But coming from a relational world how should I handle the semantics of deleting a course? That is, when a course is deleted, all the "enrollment" records should be deleted too. That is, I should delete the course from the collection of each student record. It looks like I have to fire 2 queries: one for deleting the course and then to delete it from each student's collection. Is there a way to have a single query to perform this "cascade delete" like semantic without the additional query? Does the data model need to change?
NOTE: For all other use cases the above data model works just fine:
Deleting a student => just delete that student and associated collection of courses deleted along with it.
Student willing to drop a course => just delete it from the student collection of courses
Add student/course => just add it to corresponding 'table' in essence.
The only tricky thing is handling the deletion of a course. How should I handle this scenario in MongoDB, since I hail from a relational background and am unable to figure this one out.
What you are doing is the best and most optimal way of doing it in Mongo. I am in a similar situation and after going all possible implementations of the N:M design pattern, have also arrived to this same solution.
Apparently, This is not a mongodb thing, but more of a concept of NoSQL, wherein, the less changing data (Courses) can be kept separately. And since deleting a Course is not going to be a very frequent operation, its feasible enough to go through all the records to remove it.
On the other hand, you could let it be as it is.
In your application logic, just ignore the values of Courses in the Student document that don't have a reference_id in the Course document at all. But in that case, you must make sure that old deleted Course_id's are not being reused.
OR just use the deleted flags on the Course document and handle everything else in your application logic.
I'm going to answer based on Mongo team recommendations. I also came from the relational database and I had some issues at the beginning understanding the concepts. Mongo team recommends to design with the idea of "Application-Driven" schema, so you have to figure out first what pieces of data go together. Remember there's not such a transaction concept in any possible way in Mongo, even if we invent a driver that handles transactions we should implement our own solution for this. It means if I have two business objects that requires to be updated at the same time always and I cannot tolerate a failure in this operation, I have to join them into a single document (atomic).
In your case you have two documents, Student and Courses, and a relation between then (A student enrolls to N courses). I assume courses are not required to be altered all the time, so they can be stored in a different collection.
But the point is the relation between them, in this case you need to atomically delete a Student and all the courses he enrolled in.
So the best suitable solution for this is to embed the relation into Student, and keep a separated Course collection. When you delete the student, the relation is dropped at the same time:
Student Json:
{ _id: ObjectId('...'), name:"John", lastname:"Smith",
courses: [ 1, 100, 50, 67 ], ...
}
Courses can be a separated collection between them.
This is the way to handle it in Mongo. Atomic operations must be embedded into a single document. I assumed Courses is a list of courses that don't change so much, in case they're designed by Student we could change a bit the solution.
I have the following objects Company, User and Order (contains orderlines). User's place orders with 1 or more orderlines and these relate to a Company. The time period for which orders can be placed for this Company is only a week.
What I'm not sure on is where to place the orders array, should it be a collection of it's own containing a link to the User and a link to the Company or should it sit under the Company or finally should the orders be sat under the User.
Numbers wise I need to plan for 50k+ in orders.
Queries wise, I'll probably be looking at Orders by Company mainly but I would need to find an Order by Company based for a specific user.
1) For folks coming from the SQL world (such as myself) one of the hardest learn about MongoDB is the new style of schema design. In the SQL world, everything goes into third normal form. Folks come to think that there is a single right way to design their schema, because there typically is one.
In the MongoDB world, there is no one best schema design. More accurately, in MongoDB schema design depends on how the application is going to access the data.
2) Here are the key questions that you need to have answered in order to design a good schema for MongoDB:
How much data do you have?
What are your most common operations? Will you be mostly inserting new data, updating existing data, or doing queries?
What are your most common queries?
How many I/O operations do you expect per second?
What you're talking about here is modeling Many-to-One relationships:
Company -> User
User -> Order
Order -> Order Lines
Company -> Order
Using SQL you would create a pair of master/detail tables with a primary key/foreign key relationship. In MongoDB, you have a number of choices: you can embed the data, you can create a linked relationship, you can duplicate and denormalize the data, or you can use a hybrid approach.
The correct approach would depend on a lot of details about the use case of your application, many of which you haven't provided.
3) This is my best guess - and it's only a guess - as to a good schema for you.
a) Have separate collections for Users, Companies, and Orders
If you're looking at 50k+ orders, there are too many to embed in a single document. Having them as a separate collection will allow you to reference them from both the Company and the User documents.
b) Have an array of references to the Order documents in both the Company and the User documents. This makes the query "Find all Orders for this Company" a single-document query
c) If your query pattern supports it, you might also have a duplicate link from Orders back to the owning Company and/or User.
d) Assuming that the order lines are unique to the individual Order, you would embed the Order Lines in an array within the Order documents.
e) If your order lines refer back to individual Products, you might want to have a separate Product collection, and include a reference to the Product document in the order line sub-document
4) Here are some good general references on MongoDB schema design.
MongoDB presentations:
http://www.10gen.com/presentations/mongosf2011/schemabasics
http://www.10gen.com/presentations/mongosv-2011/schema-design-by-example
http://www.10gen.com/presentations/mongosf2011/schemascale
Here are a couple of books about MongoDB schema design that I think you would find useful:
http://www.manning.com/banker/ (MongoDB in Action)
http://shop.oreilly.com/product/0636920018391.do
Here are some sample schema designs:
http://docs.mongodb.org/manual/use-cases/
Note that the "MongoDB in Action" book includes a sample schema for an e-commerce application, which is very similar to what you're trying to build -- I recommend you check it out.
I just start learning about nosql database, specially MongoDB (no specific reason for mongodb). I browse few tutorial sites, but still cant figure out, how it handle relationship between two documents/entity
Lets say for example:
1. One Employee works in one department
2. One Employee works in many department
I dont know the term 'relationship' make sense for mongodb or not.
Can somebody please give something about joins, relationship.
The short answer: with "nosql" you wouldn't do it that way.
What you'd do instead of a join or a relationship is add the departments the user is in to the user object.
You could also add the user to a field in the "department" object, if you needed to see users from that direction.
Denormalized data like this is typical in a "nosql" database.
See this very closely related question: How do I perform the SQL Join equivalent in MongoDB?
in general, you want to denormalize your data in your collections (=tables). Your collections should be optimized so that you don't need to do joins (joins are not possible in NoSQL).
In MongoDB you can either reference other collections (=tables), or you can embed them into each other -- whatever makes more sense in your domain. There are size limits to entries in a collection, so you can't just embed the encyclopedia britannica ;-)
It's probably best if you look for API documentation and examples for the programming language of your choice.
For Ruby, I'd recommend the Mondoid library: http://mongoid.org/docs/relations.html
Generally, if you decided to learn about NoSql databases you should follow the "NoSql way", i.e. learn the principles beyond the movement and the approach to design and not simply try to map RDBMS to your first NoSql project.
Simply put - you should learn how to embed and denormalize data (like Will above suggested), and not simply copy the id to simulate foreign keys.
If you do this the "foreign _id way", next step is to search for transactions to ensure that two "rows" are consistently inserted/updated. Few steps after Oracle/MySql is waiting. :)
There are some instances in which you want/need to keep the documents separate in which case you would take the _id from the one object and add it as a value in your other object.
For Example:
db.authors
{
_id:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
name:'George R.R. Martin'
}
db.books
{
name:'A Dance with Dragons'
authorId:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
}
There is no official relationship between books and authors its just a copy of the _id from authors into the authorId value in books.
Hope that helps.