I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.
Related
I'm trying to create a database using MongoDB. Since I have no experience with NoSQL database I'm having some troubles with designing the database.
I want to make a database where one student can be part of multiple sessions, and one session can contain multiple students (many-to-many). Also, each event is linked to one student inside one session.
So far I designed it like this:
Sessions:
Sessions.students = [student_id1, student_id2]
Student:
Students.sessions = [session_id1, session_id2]
But my problem is, where should I store the relation, in sessions or in student, or in both (like above)?
And is this the correct way to create the event relationships?
Event:
Event.studentid = [student_id1]
Event.sessionid = [session_id1]
1) If you need to find how many students are in each session and how many sessions each student attends.
For Many-Many relations in mongodb store id's in each other collections.
2) if one of the questions is relevant for your project then only store ids accordingly.
ex: if you need to find session information not student information then store
session.student=[studentid1, id2 id3]; student.session=[] not required
Can also use indexing for searching but dont use too much of it.
As a user you have 2 collections in the database called, students & sessions.
In students collection: each document contains an array of sessionIds & in sessions collection: each document contains array of studentIds.
To get data in list you need to use aggregation
Query for students like
students.aggregate({
//here write query using $match, $unwind, $lookup
//$unwind : sessionIds from sessions
})
Query for sessions like
sessions({
//here write query using $match, $unwind, $lookup
//$unwind : studentIds from students
})
lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D
I'm new for MongoDB , I just want to create a simple project to test performance of MongoDB
The project just like a simple CMS
it has users, blogs and comments, users can have friends
so I design my database like that
user
{
_ID:
name:
birth_day:
sex:
friends:[id_1,Id_2]
}
blogs
{
title:
owner:
tags_fiends:
comments:
[
{"_id":"","content":"","date_created":""},
{"_id":"","content":"","date_created":""},
],
"like"={"_id","_id"}
}
And How many collection are needed for this database. Can I use 1 Collection for both user and blog.Thanks in advance.
Due to mongoDB is schema less or schema free DB You can make any kind of structure within a document, which is supported:
individual elements
nested arrays
nested documents
There is a couple of things you have to considare during schema design which for it is useful to have the users and the blogs in separated schema. For example if you storing something in a nested array you can specify index for fastening the search within this array, but you can have only one multykéy index (indexed array content) within one particular collection. so if you store, friends and blogs, and posts, and tags all in arrays you can have index only on one of them.
Also important to know in this case that there is a size limit for each document what is now 16MB.
In your scenario, I would make Users a collection and reference it by _id from the blog collection.
In practise, you could make the Blogs an attribute of User, the only constraint being the max doc size of 16MB - but that's a lot of blogs (text).
To get round that (assuming you need to), a separate Blog collection referencing the user _id would be fine. You may need to denormalise the user name too if that's not your _id. This would mean you can get all the blogs for a user in a single query.
I want to store "carpool_debts" which is basically going to hold the number of days owed to other users. It looks like this:
carpool_debts{
_id,
owner,
owner_id,
creditors:[{name,
id,
amount},
{name,
id,
amount}
]}
Does that data structure look reasonable for what I want to store? Also implementing that data structure seemed cumbersome to maintain. I found it cumbersome mainly because there isn't an upsert type of function available in meteor yet. Instead of creditors being a list of sub documents would I be better off storing the creditors as a delimited string? I would like to know if I am on the right path or if I am missing something? Thanks.
You can structure mongo documents just like you would in a relational database, for example, having separate collections for creditors and owners and using carpool_debts as a link table with the amount attached:
carpool_debts{
_id,
owner_id,
creditor_id,
amount}
creditors{
_id,
name}
owners{
_id,
name}
However, this is not using mongodb to its full potential. Especially if this is a database with masses of data, you may want to optimise it for the most used queries, otherwise it'll be slow. For example, to optimise for looking up an owner's debt, you can add the data needed right there in the owners collection, using sub documents for creditors, and sub documents again for individual debts, similar to what you've already done:
owners{
_id,
name,
creditors: {id,
name,
debts: {
amount,
due_date}
}
}
and similarly, add the debt information on the creditors collection if you often look up the outstanding debt of creditors:
creditors{
_id,
name,
debtors: {
owner_id,
owner_name,
debts: {
aount,
due_date
}
}
}
This way, you only need to look up one record to get all the information you need. Of course, there are catches. First of all, this is not very DRY, but that's intentional. But you have to remember to update the other table(s) when something changes. If you change the name of a creditor for example, you'll need to update every owner document that has debts with this creditor (make sure you index that). This of course makes updates much slower (and the database bigger), but if you don't update very often, and look up much more often, this is not going to be a problem.
Also if for example creditors can have thousands of individual outstanding debts, you may have to separate that into a link table, or rather, link collection, like this, so you don't exceed mongodb's maximum document size:
creditors{
_id,
name,
}
debtors: {
owner_id,
creditor_id,
debts: {
amount,
due_date
}
}
Then you have one document for each creditor-owner connection. This means more documents to look up when looking at a creditor, but still just one for looking up an owner.
This looks fine, but you could also consider separating creditors into its own collection and just storing an array of creditor_id's in the debts collection. That would reduce complexity and make finding and filtering information easier. And it would be more DRY since if there are multiple debts with the same creditor, you only have the creditor stored in a single place.
You could also consider just having each document in the debts collection be a single debt by an owner to a single creditor. Then you'd just have id, owner_id and creditor_id - like a link table in a relational database.
I'm building a database with several collections. I have unique strings that I plan on using for all the documents in the main collection. Documents in other collections will reference documents in the main collection, which means I'll have to save said id's in the other collections. However, if _id's only need to be unique across a collection and not across an entire database, then I would just make the _id's in the other collections also use the aforementioned unique strings.
Also, I assume that in order to set my own _id's, all I have to do is have an "_id":"unique_string" property as part of the document that I insert, correct? I wouldn't need to convert the "unique_string" into another format, right?
Also, hypothetically speaking, would I be able to have a variable save the string "_id" and use that instead? Just to be clear, something as follows: var id = "_id" and then later on in the code (during an insert or a query for example) have id:"unique_string".
Best, and thanks,Sami
_ids have to be unique in a collection. You can quickly verify this by inserting two documents with the same _id in two different collections.
Your other assumptions are correct, just try them and see whether they work (they will). The proof of the pudding is in the eating.
Note: use _id directly, var id = "_id" just compilcates the code.