Suppose I have Student and Teacher in a many to many relationship. If I just want to find out all the teachers for a given student or vice versa I generally model it by using embedded Object Ids. For example if teacher has a property studentIds which is an array of student Object Ids then that is enough information to do all the queries you need.
However suppose that a student can give a teacher a rating. How should this rating fit into the model? At the moment I do the following:
Inside teacher instead of storing an array of student, I store an
array of json objects {studentId: ObjectId, rating: String}
When doing the query, I transform the array of json objects into an array
of studentIds and extract the full information as usual
So now I have an array of student objects and an array of json objects with
the ratings
However, since the $in operator in MongoDB does not preserve ordering, I need to do my own sorting
At the last step, with everything in order I can combine student objects with ratings
to get what I want
It works but seems somewhat convoluted is there a better way of doing this?
Here are some considerations. In the end, it depends on your requirements:
Rating is optional, right?
If so, ask yourself whether you want to combine a required feature (storing teacher/student association) with a nice-to-have one. Code that implements a nice-to-have feature now writes to your most important collection. I think you can improve separation of concerns in your code with a different db schema.
Will you need more features?
Let's say you want to provide students with a list of ratings they gave, the average rating a student has given to teachers, and you want to show a development of ratings over time. This will be very messy with embedded documents. Embedded documents are less flexible.
If you need top read performance, you need to denormalize more data
If you want to stick to the embedded documents, you might want to copy more data. Let's say there's an overview of ratings per teacher where you can see the students' names. It would be helpful to embed an object
{ studentId : ObjectId,
rating: string,
studentName: string,
created: dateTime }
As alternatives, consider
TeacherRating {
StudentId: id
TeacherId: id
Rating: number
Created: DateTime
}
Teacher/student association will still be stored in the teacher object, but the ratings are in a different collections. A rating can't be created if no association between teacher and student can be found.
or
TeacherStudentClass {
StudentId: id
TeacherId: id
Class: id
ClassName: string // (denormalized, just an example)
Rating: number // (optional)
Created: DateTime
}
To find all students for a given teacher, you'd have to query the linker document first, then do a $in query on the students, and vice versa. That is one more query, but it comes with a huge gain in flexibility.
Related
Let's say I have an item model that has the following fields: _id, title, category, sub_category. What would be a more efficient or better way to handle the category and sub_category fields?
The most obvious one would be just to make them as String types and add an enum validation. The other way would to make two separate collections of Categories and child collection SubCategories and then reference the sub category in the item which will also lead to the actual category.
From my understanding, when querying, it is more efficient to search by ObjectId rather than by String types, as well as that, it is better for indexing. However, this method also requires the use of .populate() which would slow down the querying.
So in the end, what would be more efficient? Or is one approach is better even if it less efficient?
Best way to implement it in a Graph DB,
Every element is category
now relationship is required between two categories
category1 <--- [sub-category-of] --- category2
https://www.mongodb.com/presentations/mongodb-days-silicon-valley-implementing-graph-databases-with-mongodb
I’m starting to learn MongoDB and I at one moment I was asking myself how to solve the “one to many” relationship design in MongoDB. While searching, I found many comments in other posts/articles like ” you are thinking relational “.
Ok, I agree. There will be some cases like duplication of information won’t be a problem, like in for example, CLIENTS-ORDERS example.
But, suppose you have the tables: ORDERS, that has an embedded DETAIL structure with the PRODUCTS that a client bought.
So for one thing or another, you need to change a product name (or another kind of information) that is already embedded in several orders.
At the end, you are force to do a one-to-many relashionship in MongoDB (that means, putting the ObjectID field as link to another collection) so you can solve this simple problem, don’t you ?
But every time I found some article/comment about this, it says that will be a performance fault in Mongo. It’s kind of disappointing
Is there another way to solve/design this without performance fault in MongoDB ?
One to Many Relations
In this relationship, there is many, many entities or many entities that map to the one entity. e.g.:
- a city have many persons who live in that city. Say NYC have 8 million people.
Let's assume the below data model:
//city
{
_id: 1,
name: 'NYC',
area: 30,
people: [{
_id: 1,
name: 'name',
gender: 'gender'
.....
},
....
8 million people data inside this array
....
]
}
This won't work because that's going to be REALLY HUGE. Let's try to flip the head.
//people
{
_id: 1,
name: 'John Doe',
gender: gender,
city: {
_id: 1,
name: 'NYC',
area: '30'
.....
}
}
Now the problem with this design is that if there are obviously multiple people living in NYC, so we've done a lot of duplication for city data.
Probably, the best way to model this data is to use true linking.
//people
{
_id: 1,
name: 'John Doe',
gender: gender,
city: 'NYC'
}
//city
{
_id: 'NYC',
...
}
In this case, people collection can be linked to the city collection. Knowing we don't have foreign key constraints, we've to be consistent about it. So, this is a one to many relation. It requires 2 collections. For small one to few (which is also one to many), relations like blog post to comments. Comments can be embedded inside post documents as an array.
So, if it's truly one to many, 2 collections works best with linking. But for one to few, one single collection is generally enough.
The problem is that you over normalize your data. An order is defined by a customer, who lives at a certain place at the given point in time, pays a certain price valid at the time of the order (which might heavily change over the application lifetime and which you have to document anyway and several other parameters which are all valid only in a certain point of time. So to document an order (pun intended), you need to persist all data for that certain point in time. Let me give you an example:
{ _id: "order123456789",
date: ISODate("2014-08-01T16:25:00.141Z"),
customer: ObjectId("53fb38f0040980c9960ee270"),
items:[ ObjectId("53fb3940040980c9960ee271"),
ObjectId("53fb3940040980c9960ee272"),
ObjectId("53fb3940040980c9960ee273")
],
Total:400
}
Now, as long as neither the customer nor the details of the items change, you are able to reproduce where this order was sent to, what the prices on the order were and alike. But now what happens if the customer changes it's address? Or if the price of an item changes? You would need to keep track of those changes in their respective documents. It would be much easier and sufficiently efficient to store the order like:
{
_id: "order987654321",
date: ISODate("2014-08-01T16:25:00.141Z"),
customer: {
userID: ObjectId("53fb3940040980c9960ee283"),
recipientName: "Foo Bar"
address: {
street: "742 Evergreen Terrace",
city: "Springfield",
state: null
}
},
items: [
{count:1, productId:ObjectId("53fb3940040980c9960ee300"), price: 42.00 },
{count:3, productId:ObjectId("53fb3940040980c9960ee301"), price: 0.99},
{count:5, productId:ObjectId("53fb3940040980c9960ee302"), price: 199.00}
]
}
With this data model and the usage of aggregation pipelines, you have several advantages:
You don't need to independently keep track of prices and addresses or name changes or gift buys of a customer - it is already documented.
Using aggregation pipelines, you can create a price trends without the need of storing pricing data independently. You simply store the current price of an item in an order document.
Even complex aggregations such as price elasticity, turnover by state / city and alike can be done using pretty simple aggregations.
In general, it is safe to say that in a document oriented database, every property or field which is subject to change in the future and this change would create a different semantic meaning should be stored inside the document. Everything which is subject to change in the future but doesn't touch the semantic meaning (the users password in the example) may be linked via a GUID.
I am running into a scenario where I am asking myself do I need to put each entity (a Classroom has many students) into separate Meteor.collection object or rather embed an array of students inside the classroom object and have one Meteor.collection Classroom object.
My instinct tells me to put Classroom and Students in their own Meteor.collections but I am not sure how to establish a one to many relationship between the two Meteor collection objects.
What if there are many more traditional one-to-many, many-to-many relationships translate into Meteor way of doing things?
My question arises from the fact that .aggregate() is not supported, and realizing that it's impossible without doing a recursive loop to grab nested and embedded documents, inside a parent document in which Meteor collection exists (ex. Classroom).
Most of the time it is useful to put separate object types into separate collections.
Let's say we have a one to many relationship:
Classrooms.insert({
_id: "sdf8ad8asdj2jef",
name: "test classroom"
});
Students.insert({
_id: "lof8gzanasd9a7j2n",
name: "John"
classroomId: "sdf8ad8asdj2jef"
});
Get all Students in classroom sdf8ad8asdj2jef:
Students.find({classroomId: "sdf8ad8asdj2jef"});
Get the classroom with student lof8gzanasd9a7j2n:
var student = Studtents.findOne("lof8gzanasd9a7j2n");
var classroom = Classrooms.find(student.classroomId);
Putting the objects into separate collections is especially useful when you are going to use Meteor.publish() and Meteor.subscribe().
Meteor.publish() is pretty handy when you want to publish only data to the client that is really relevant to the user.
The following publishes only students who are in the room with the given classroomId.
(So the client doesn't have to download all student objects from the server database. Only those who are relevant.)
Meteor.publish("students", function (classroomId) {
return Students.find({classroomId: classroomId});
});
Many to many relationships are also not that complicated:
Classrooms.insert({
_id: "sdf8ad8asdj2jef",
name: "test classroom"
studentIds: ["lof8gzanasd9a7j2n"]
});
Students.insert({
_id: "lof8gzanasd9a7j2n",
name: "John"
classroomIds: ["sdf8ad8asdj2jef"]
});
Get all students in classroom sdf8ad8asdj2jef:
Students.find({classroomIds: "sdf8ad8asdj2jef"});
Get all classrooms with student lof8gzanasd9a7j2n:
Classrooms.find({studentIds: "lof8gzanasd9a7j2n"});
More information on MongoDBs read operations.
Separate collections for students and classrooms seems more straightforward.
I think just keeping a 'classroom' or 'classroomId' field in each student document will allow you to join the two collections when necessary.
I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.
Here's the deal. Let's suppose we have the following data schema in MongoDB:
items: a collection with large documents that hold some data (it's absolutely irrelevant what it actually is).
item_groups: a collection with documents that contain a list of items._id called item_groups.items plus some extra data.
So, these two are tied together with a Many-to-Many relationship. But there's one tricky thing: for a certain reason I cannot store items within item groups, so -- just as the title says -- embedding is not the answer.
The query I'm really worried about is intended to find some particular groups that contain some particular items (i.e. I've got a set of criteria for each collection). In fact it also has to say how much items within each found group fitted the criteria (no items means group is not found).
The only viable solution I came up with this far is to use a Map/Reduce approach with a dummy reduce function:
function map () {
// imagine that item_criteria came from the scope.
// it's a mongodb query object.
item_criteria._id = {$in: this.items};
var group_size = db.items.count(item_criteria);
// this group holds no relevant items, skip it
if (group_size == 0) return;
var key = this._id.str;
var value = {size: group_size, ...};
emit(key, value);
}
function reduce (key, values) {
// since the map function emits each group just once,
// values will always be a list with length=1
return values[0];
}
db.runCommand({
mapreduce: item_groups,
map: map,
reduce: reduce,
query: item_groups_criteria,
scope: {item_criteria: item_criteria},
});
The problem line is:
item_criteria._id = {$in: this.items};
What if this.items.length == 5000 or even more? My RDBMS background cries out loud:
SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs)
is definitely not a good way to go.
Thank you sooo much for your time, guys!
I hope the best answer will be something like "you're stupid, stop thinking in RDBMS style, use $its_a_kind_of_magicSphere from the latest release of MongoDB" :)
I think you are struggling with the separation of domain/object modeling from database schema modeling. I too struggled with this when trying out MongoDb.
For the sake of semantics and clarity, I'm going to substitute Groups with the word Categories
Essentially your theoretical model is a "many to many" relationship in that each Item can belong Categories, and each Category can then possess many Items.
This is best handled in your domain object modeling, not in DB schema, especially when implementing a document database (NoSQL). In your MongoDb schema you "fake" a "many to many" relationship, by using a combination of top-level document models, and embedding.
Embedding is hard to swallow for folks coming from SQL persistence back-ends, but it is an essential part of the answer. The trick is deciding whether or not it is shallow or deep, one-way or two-way, etc.
Top Level Document Models
Because your Category documents contain some data of their own and are heavily referenced by a vast number of Items, I agree with you that fully embedding them inside each Item is unwise.
Instead, treat both Item and Category objects as top-level documents. Ensure that your MongoDb schema allots a table for each one so that each document has its own ObjectId.
The next step is to decide where and how much to embed... there is no right answer as it all depends on how you use it and what your scaling ambitions are...
Embedding Decisions
1. Items
At minimum, your Item objects should have a collection property for its categories. At the very least this collection should contain the ObjectId for each Category.
My suggestion would be to add to this collection, the data you use when interacting with the Item most often...
For example, if I want to list a bunch of items on my web page in a grid, and show the names of the categories they are part of. It is obvious that I don't need to know everything about the Category, but if I only have the ObjectId embedded, a second query would be necessary to get any detail about it at all.
Instead what would make most sense is to embed the Category's Name property in the collection along with the ObjectId, so that pulling back an Item can now display its category names without another query.
The biggest thing to remember is that the key/value objects embedded in your Item that "represent" a Category do not have to match the real Category document model... It is not OOP or relational database modeling.
2. Categories
In reverse you might choose to leave embedding one-way, and not have any Item info in your Category documents... or you might choose to add a collection for Item data much like above (ObjectId, or ObjectId + Name)...
In this direction, I would personally lean toward having nothing embedded... more than likely if I want Item information for my category, i want lots of it, more than just a name... and deep-embedding a top-level document (Item) makes no sense. I would simply resign myself to querying the database for an Items collection where each one possesed the ObjectId of my Category in its collection of Categories.
Phew... confusing for sure. The point is, you will have some data duplication and you will have to tweak your models to your usage for best performance. The good news is that that is what MongoDb and other document databases are good at...
Why don't use the opposite design ?
You are storing items and item_groups. If your first idea to store items in item_group entries then maybe the opposite is not a bad idea :-)
Let me explain:
in each item you store the groups it belongs to. (You are in NOSql, data duplication is ok!)
for example, let's say you store in item entries a list called groups and your items look like :
{ _id : ....
, name : ....
, groups : [ ObjectId(...), ObjectId(...),ObjectId(...)]
}
Then the idea of map reduce takes a lot of power :
map = function() {
this.groups.forEach( function(groupKey) {
emit(groupKey, new Array(this))
}
}
reduce = function(key,values) {
return Array.concat(values);
}
db.runCommand({
mapreduce : items,
map : map,
reduce : reduce,
query : {_id : {$in : [...,....,.....] }}//put here you item ids
})
You can add some parameters (finalize for instance to modify the output of the map reduce) but this might help you.
Of course you need to have another collection where you store the details of item_groups if you need to have it but in some case (if this informations about item_groups doe not exist, or don't change, or you don't care that you don't have the most updated version of it) you don't need them at all !
Does that give you a hint about a solution to your problem ?