I'm using MongoDB. I know that MongoDB isn't relational but information sometimes is. So what's the most efficient way to reference these kinds of relationships to lessen database load and maximize query speed?
Example:
* Tinder-style "matches" *
There are many users in a Users collection. They get matched to each other.
So I'm thinking:
Document 1:
{
_id: "d3fg45wr4f343",
firstName: "Bob",
lastName: "Lee",
matches: [
"ferh823u9WURF",
"8Y283DUFH3FI2",
"KJSDH298U2F8",
"shdfy2988U2Ywf"
]
}
Document 2:
{
_id: "d3fg45wr4f343",
firstName: "Cindy",
lastName: "Doe",
matches: [
"d3fg45wr4f343"
]
}
Would this work OK if there were, say, 10,000 users and you were on Bob's profile page and you wanted to display the firstName of all of his matches?
Any alternative structures that would work better?
* Online Forum *
I supposed you could have the following collections:
Users
Topics
Users Collection:
{
_id: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
}
{
_id: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
Topics Collection Version 1
One example document in the Topics Collection:
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3"
}
]
}
Topics Collection Version 2
One example document in the Topics Collection. The author's avatar and userName are now embedded in the document. I know that:
This is not DRY.
If the author changes their avatar and userName, these change would need to be updated in the Topics Collection and in all of the post documents that are in it.
BUT it saves the system from querying for all the avatars and userNames via the authors ID every single time this thread is viewed on the client.
{
title: "A spider just popped out of the AC",
dateTimeSubmitted: 201408201200,
category: 5,
posts: [
{
message: "I'm going to use a gun.",
dateTimeSubmitted: 201408201200,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "I don't think this would work.",
dateTimeSubmitted: 201408201201,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
},
{
message: "It will totally work.",
dateTimeSubmitted: 201408201202,
author: "d3fg45wr4f343",
userName: "aircon",
avatar: "234232.jpg"
},
{
message: "ur dumb",
dateTimeSubmitted: 201408201203,
author: "23qdf3a3fq3fq3",
userName: "spider",
avatar: "986754.jpg"
}
]
}
So yeah, I'm not sure which is best...
If the data is realy many to many i.e. one can have many matches and can be matched by many in your first example it is usually best to go with relations.
The main arguments against relations stem from mongodb not beeing a relational database so there are no such things as foreign key constraints or join statements.
The trade off you have to consider in those many to many cases (many beeing much more than two) is either enforce the key constraints yourself or manage the possible data inconsistencies accross the multiple documents (your last example). And in most cases the relational approach is much more practical than the embedding approach for those cases.
Exceptions could be read often write seldom examples. For (a very constructed) example when in your first example matches would be recalculated once a day or so by wiping all previous matches and calculating a list of new matches. In that case the data inconsistencies you would introduce could be acceptable and the read time you save by embedding the firstnames of the matches could be an advantage.
But usually for many to many relations it would be best to use a relational approach and make use of the array query features such as {_id :{$in:[matches]}}.
But in the end it all comes down to the consideration of how many inconsistencies you can live with and how fast you realy need to access the data (is it ok for some topics to have the old avatar for a few days if I save half a second of page load time?).
Edit
The schema design series on the mongodb blog might be a good read for you: part1, part2 and part3
Related
it is easy to deal with 1-1(via refs) or 1-N(via populate virtuals) relations in MongoDB
but how to deal with N-M relations?
suppose I have 2 entities teacher and classroom
many teachers can access many classrooms
many classrooms can be accessed by many teachers
teacher.schema
{
name:String;
//classrooms:Array;
}
classrooms.schema
{
name:String;
//teachers:Array
}
is there a direct way(similar like populate virtuals) to keep this N-M relations so that when one teacher removed, then teachers in classroom can automatically be changed too?
should I use a third 'bridge' schema like TeacherToClassroom to record their relations?
i am thinking of some thing like this, like a computed value
teacher.schema
{
name:String;
classrooms:(row)=>{
return db.classrooms.find({_id:{$elemMatch:row._id }})
}
}
classrooms.schema
{
name:String;
teachers:{Type:ObjectId[]}
}
so that i just manage the teacher ids in classrooms, then the classroom property in teach schema will auto computed
The literature describes a few methods on how to implement a m-n relationship in Mongodb.
The first method is by two-way embedding. Looking at an example using books and director of movies:
{
_id: 1,
name: "Peter Griffin",
books: [1, 2]
}
{
_id: 2,
name: "Luke Skywalker",
books: [2]
}
{
_id: 1,
title: "War of the oceans",
categories: ["drama"],
authors: [1, 2]
}
{
_id: 2,
title: "Into the loop",
categories: ["scifi"],
authors: [1]
}
The second option is to use one-way embedding. This means you only embed one of the documents into the other. Like so (movie with a genre):
{
_id: 1,
name: "drama"
}
{
_id: 1,
title: "War of the oceans",
categories: [1],
authors: [1, 2]
}
When the data you are embedding becomes larger you could use something like the bucketing pattern to split it up: https://www.mongodb.com/blog/post/building-with-patterns-the-bucket-pattern
As you can see in the above example by embedding the documents you still only need to modify the data in one location. You do not need any intermediate tables to do that.
In some cases you might even be able to omit an entire document when it has no meaning as a stand-alone object: Absorbing N in a M:N relationship
I'm studying mongodb and want to build a little database for web blogs page.
It is known that in mongo we use collections and documents, in opposite, of tables and records.
I have 2 documents (entities): User (id, nikname) and Publication (id, title ...) In relational database we would have user_id as column inside "Publication" and this would mean that users able to have many publications.
Example1
User
{
id: "123456",
nikname: "cool guy",
publications: [
{
id: "some id1",
title: "some title111",
text: "bla bla bla",
// any fields
},
{
id: "some id2",
title: "some title222",
text: "bla bla bla",
// any fields
},
....
]
}
Publication
{
id: "some id",
title: "some title",
text: "bla bla bla",
// any fields
}
In example above, each user has own array of publications.
My question is: Is this a good way to do like this? What if one user will have 1000 publications?
Moreover, if each user has an own publications then why we need to store publications table (in MONGO it is called COLLECTION)
outside the user as separate entity.
I was also thinking about a storing publication ids inside of user.
Example2
User
{
id: "123456",
nikname: "cool guy"
}
Publication 1
{
id: "some id",
title: "some title",
text: "bla bla bla",
// any fields
USER_ID: 123456
}
Publication 2
{
id: "some id",
title: "some title",
text: "bla bla bla",
// any fields
USER_ID: 123456
}
But Example2 does not differ from relational approach...
So What way will be better ?
In short, would like to know opinions of guys who worked with mongo.
In Mongo there are 3 ways you can design your model relationships.
One to One
One to Many (Embedded Docs) : your example 1
One to Many (Document References) : your example 2
Rule of thumb is that you need to consider your data retrieval pattern
of your application.
For example, if your application need to fetch Publication related to a particular user heavily, you can go for example 1 and you don't need to maintain publications in a separate collection (unless application requires it). Having a lot of sub documents are not a problem as far as a single document will not exceed the hard limits.
Example 2 of your one good if your application need to query by publications as well as user (similar to a relation model). However I see this is a not a optimized solution.
Some resource:
https://docs.mongodb.com/manual/applications/data-models-relationships/
In a Mongo environment it is beneficial to embed data in documents.
so for example an Employees document:
{
{
userid: 'someid',
username: 'user1'
isManager: true,
subordinates: [
{
userid: 'anotherid',
username: 'user2',
isManager: false
}
],
officeLocation: {
officeId: 'someofficeid',
officeName: 'Some Office'
}
},
{
userid: 'anotherid',
username: 'user2',
isManager: false,
officeLocation: {
officeId: 'someotherofficeid',
officeName: 'Some Other Office'
}
}
}
And the office document:
{
{
officeid: 'someofficeid',
officeName: 'Some Office'
},
{
officeid: 'someotherofficeid',
officeName: 'Some Other Office'
}
}
So lets assume that someone in the company decides that they don't like the name Some Other Office and they want to change it to Some Cool Office.
When they make the change in the office document how do we know to update all the embedded Some Other Office in the employee document as well?
It seems that every time that you take a piece of data from one document and embed it into an object in another document that the link between the two gets broken and then you have to write separate queries to update the data in all the different spots that you embedded that object into.
I like the idea of embedded documents rather than storing references, but without some kind of 2 way data-binding it seems impractical when it comes to updating information.
Is there any way that I would be able to bind the data two ways or is there an easier way to go about modeling my data?
Thanks
It remainds me about the traditional RDBMS systems when you model to normalize/denormalize an information. I'm not sure about the binding, but, if you need the "single true" for an information, the better way is never having the information stored in two different places. So, in your case, it may be better to store the Office information into a separated document and just link it by Id.
I am trying to implement a collection in meteor/mongo which is of following nature:
FIRST_NAME-------LAST_NAME-------------CLASSES----------PROFESSORS
----------A-----------------------B------------------------------a---------------------b
-------------------------------------------------------------------c---------------------d
-------------------------------------------------------------------e---------------------f
-------------------------------------------------------------------g---------------------h
-------------M-------------------------N------------------------c---------------------d
-------------------------------------------------------------------p---------------------q
-------------------------------------------------------------------x---------------------q
-------------------------------------------------------------------m---------------------n
-------------------------------------------------------------------r---------------------d
So as above, a person can take multiple classes and a class can have multiple people. Now, I want to make this collection searchable and sortable by all possible fields. (Also that one professor can teach multiple classes.)
Searching by FIRST_NAME and LAST_NAME is easy in above shown model. But, I should be able to see all student depending on the class I select. I would also want to see list of classes sorted in alphabetical order and also the people enrolled in corresponding classes?
Can you please let me know how to approach this in a meteor/mongo style? I would also be glad if you could lead me to any resources available on this?
You are describing one of the typical data structures which are better suited for a relational database. But don't worry. For reasonably sized data sets it is quite workable in MongoDB too.
When modelling this type of structure in a document database you use embedding, which does lead to data duplication, but this data duplication is typically not a problem.
Pseudo-code for your model:
Collection schoolClass: { // Avoid the reserved word "class"
_id: string,
name: string,
students: [ { _id: string, firstName: string, lastName: string } ],
professor: { _id: string, firstName: string, lastName: string }
}
Collection student: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
Collection professor: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
This gives you easily searchable/sortable entry points to all objects. You only follow the "relation" _id to the next collection if you need some special data from an object. All data needed for all documents in the common queries should be present in the Collection the query is run on.
You just need to make sure you update all the relevant collections when an object changes.
A good read is https://docs.mongodb.com/manual/core/data-modeling-introduction/
I'm coming from the SQL world, so naturally mongo / noSQL has been an adventure.
I'm building a page to add/edit categories, that "posts" will later be assigned to.
What I've basically created is this:
{
_id: "asdf234ljsf",
title: "CategoryOne",
sortorder: 1,
active: true,
children: [
{
title: ChildOne,
sortorder: 1,
active: true
},
{
title: ChildTwo,
sortorder: 2,
active: true
}
]
}
So later, when creating a "post" I would assign that post to one or more parent categories, as well as optionally one or more child categories within the selected parent categories. Visitors to the site if they clicked on a parent category, it would show all posts within that parent category, and if they select a child category, it will only show posts within that child category.
The logic is obvious and simple, but in SQL I would have created tables like this:
table_Category ( CategoryID, Title, Sort, Active )
table_Category_Children ( ChildID, ParentID, Title, Sort, Active )
I've been reading the Discover Meteor book and it mentions that Meteor gives us many tools that work a lot better when operating at the collection level, as well as how the DDP operates at the top level of a document, meaning if something small changed down in a sub collection or array, potentially unneeded data will be sent back to all connected/subscribed clients.
So, this makes me think I should be organizing the categories like this:
Collection for parent categories
{
_id: "someid",
title: "CategoryOne"
sortorder: 1,
active: true
},
{
_id: "someid",
title: "CategoryTwo"
sortorder: 1,
active: true
}
Collection for Child Categories
{
_id: "someid",
parent: "idofparent"
title: "ChildOne"
sortorder: 1,
active: true
},
{
_id: "someid",
parent: "idofparent"
title: "ChildTwo"
sortorder: 1,
active: true
}
Or, perhaps its better like this:
Collection for parent categories
{
_id: "someid",
title: "CategoryOne"
sortorder: 1,
active: true,
children: [ { id: "childid" }, ... ]
}
I think understanding a best practice/method for Meteor and Mongo in this scenario will help me greatly across the board.
So conclusion: I have an admin page where I add/edit these categories. When clients create a post, they'll select the parent and child categories suitable for their post and make sure that I organize it properly from the beginning. Changing my thinking process from a traditional RDBMS to NoSQL is a big jump.
Thank you!
MongoDB stores all data in documents. This is a fundamental difference from relational database like SQL.
Imagine if you have 100 parent categories and 1000 child categories, once you update a parent category it will affect all linked child category's "idofparent", in a reactive way. In short, it's not sustainable.
Try to think of a way to avoid JOIN SQL equivalent in MongoDB.
Restructure you data perhaps similar to this way:
One big collection for all categories:
{
_id: id,
title: title,
sortorder: 1,
active: 1,
class: "parent > child" // make this as a field
...
}
// class can be "parent1", "parent2", "parent1 > child1" ... you get the idea
so each document store is completely individual.
Or if you absolutely need JOIN relational data structure, I don't think MongoDB is the right choice for you.