I'm trying to create a database using MongoDB. Since I have no experience with NoSQL database I'm having some troubles with designing the database.
I want to make a database where one student can be part of multiple sessions, and one session can contain multiple students (many-to-many). Also, each event is linked to one student inside one session.
So far I designed it like this:
Sessions:
Sessions.students = [student_id1, student_id2]
Student:
Students.sessions = [session_id1, session_id2]
But my problem is, where should I store the relation, in sessions or in student, or in both (like above)?
And is this the correct way to create the event relationships?
Event:
Event.studentid = [student_id1]
Event.sessionid = [session_id1]
1) If you need to find how many students are in each session and how many sessions each student attends.
For Many-Many relations in mongodb store id's in each other collections.
2) if one of the questions is relevant for your project then only store ids accordingly.
ex: if you need to find session information not student information then store
session.student=[studentid1, id2 id3]; student.session=[] not required
Can also use indexing for searching but dont use too much of it.
As a user you have 2 collections in the database called, students & sessions.
In students collection: each document contains an array of sessionIds & in sessions collection: each document contains array of studentIds.
To get data in list you need to use aggregation
Query for students like
students.aggregate({
//here write query using $match, $unwind, $lookup
//$unwind : sessionIds from sessions
})
Query for sessions like
sessions({
//here write query using $match, $unwind, $lookup
//$unwind : studentIds from students
})
Related
I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results. This collection may store more than 1 million of documents per user.
The document schema of this collection can be completely different between users. For instance, we have user1 and user2. Examples of document in the Resultscollection for user1 and user2:
{
user: ObjectId(user1):, // reference to user1 in the Users collection
inputFields: {variable1: 3, ...},
outputFields: { result1: 504.75 , ...}
}
{
user: ObjectId(user2):,
inputFields: {country: US, ...},
outputFields: { cost: 14354.45, ...}
}
I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1 must me able to filter by inputFields.variable1, and user2 by outputFields.cost). Of course I know that I must use indexes, otherwise the queries are so slow.
My first attempt was to create an index for each different field in the Results collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression.
The problem is that if another user has the same schema in the Results collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:
Index with pattern: { inputFields.country: 1 } already exists with different options
It happens because the partial indexes cannot index the same fields even though the partialFilterExpression is different.
So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?
Thanks
I have data across three collections and need to produce a data set which aggregates data from these collections, and filters by a date range.
The collections are:
db.games
{
_id : ObjectId,
startTime : MongoDateTime
}
db.entries
{
player_id : ObjectId, // refers to db.players['_id']
game_id : ObjectId // refers to db.games['_id']
}
db.players
{
_id : ObjectId,
screen_name,
email
}
I want to return a collection which is number of entries by player for games within a specified range. Where the output should look like:
output
{
player_id,
screen_name,
email,
sum_entries
}
I think I need to start by creating a collection of games within the date range, combined with all the entries and then aggregate over count of entries, and finally output collection with the player data, it's seems a lot of steps and I'm not sure how to go about this.
The reason why you have these problems is because you try to use MongoDB like a relational database, not like a document-oriented database. Normalizing your data over many collections is often counter-productive, because MongoDB can not perform any JOIN-operations. MongoDB works much better when you have nested documents which embed other objects in arrays instead of referencing them. A better way to organize that data in MongoDB would be to either have each game have an array of players which took part in it or to have an array in each player with the games they took part in. It's also not necessarily a mistake to have some redundant additional data in these arrays, like the names and not just the ID's.
But now you have the problem, so let's see how we can deal with it.
As I said, MongoDB doesn't do JOINs. There is no way to access data from more than one collection at a time.
One thing you can do is solving the problem programmatically. Create a program which fetches all players, then all entries for each player, and then the games referenced by the entries where startTimematches.
Another thing you could try is MapReduce. MapReduce can be used to append results to another collection. You could try to use one MapReduce job for each of the relevant collections into one and then query the resulting collection.
I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.
For example, we have two collections
users {userId, firstName, lastName}
votes {userId, voteDate}
I need a report of the name of all users which have more than 20 votes a day.
How can I write query to get data from MongoDB?
The easiest way to do this is to cache the number of votes for each user in the user documents. Then you can get the answer with a single query.
If you don't want to do that, the map-reduce the results into a results collection, and query that collection. You can then run incremental map-reduces that only calculate new votes to keep your results up to date: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-IncrementalMapreduce
You shouldn't really be trying to do joins with Mongo. If you are you've designed your schema in a relational manner.
In this instance I would store the vote as an embedded document on the user.
In some scenarios using embedded documents isn't feasible, and in that situation I would do two database queries and join the results at the client rather than using MapReduce.
I can't provide a fuller answer now, but you should be able to achieve this using MapReduce. The Map step would return the userIds of the users who have more than 20 votes, the reduce step would return the firstName and lastName, I think...have a look here.
I'm building a database with several collections. I have unique strings that I plan on using for all the documents in the main collection. Documents in other collections will reference documents in the main collection, which means I'll have to save said id's in the other collections. However, if _id's only need to be unique across a collection and not across an entire database, then I would just make the _id's in the other collections also use the aforementioned unique strings.
Also, I assume that in order to set my own _id's, all I have to do is have an "_id":"unique_string" property as part of the document that I insert, correct? I wouldn't need to convert the "unique_string" into another format, right?
Also, hypothetically speaking, would I be able to have a variable save the string "_id" and use that instead? Just to be clear, something as follows: var id = "_id" and then later on in the code (during an insert or a query for example) have id:"unique_string".
Best, and thanks,Sami
_ids have to be unique in a collection. You can quickly verify this by inserting two documents with the same _id in two different collections.
Your other assumptions are correct, just try them and see whether they work (they will). The proof of the pudding is in the eating.
Note: use _id directly, var id = "_id" just compilcates the code.