Querying MongoDB collection with heterogeneous schema efficiently - mongodb

I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results. This collection may store more than 1 million of documents per user.
The document schema of this collection can be completely different between users. For instance, we have user1 and user2. Examples of document in the Resultscollection for user1 and user2:
{
user: ObjectId(user1):, // reference to user1 in the Users collection
inputFields: {variable1: 3, ...},
outputFields: { result1: 504.75 , ...}
}
{
user: ObjectId(user2):,
inputFields: {country: US, ...},
outputFields: { cost: 14354.45, ...}
}
I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1 must me able to filter by inputFields.variable1, and user2 by outputFields.cost). Of course I know that I must use indexes, otherwise the queries are so slow.
My first attempt was to create an index for each different field in the Results collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression.
The problem is that if another user has the same schema in the Results collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:
Index with pattern: { inputFields.country: 1 } already exists with different options
It happens because the partial indexes cannot index the same fields even though the partialFilterExpression is different.
So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?
Thanks

Related

How to create a collection in a document in MongoDB?

I am using MongoDB for the first time, and I have some experience with NoSQL databases.
I am attempting to replicate behaviour that I have managed to achieve on Google's Cloud Firestore:
I want to create a collection within a document. I have not been able to replicate this behaviour using MongoDB as I cannot find code in the documentation. Is this behaviour even possible please?
Thanks in advance.
Edit:
Here is a screenshot of a sample document in biometric_data :
MongoDB has embedded documents which can be used to store the same data. You can try creating an array of sub-documents (each having name and data property):
{
name: "",
email: "",
...otherFields,
biometric_data: [
{
name: "glucose",
data: {
preferred_unit: "mg/dL"
// Add new properties as required
}
},
{
name: "weight",
data: {
preferred_unit: "KG"
}
}
],
...templateData
}
However, a document's size in MongoDB cannot exceed 16 MB. If number of fields in biometric_data are limited then you can use sub-documents otherwise you might have to create another collection to store those as documents (generally preferred for chat apps or where number of sub-documents can be really high).
Sub-collections (in Firestore) allow you to structure data hierarchically, making data easier to access. For example, users and posts collections can be structured in either of the ways below:
With sub-collection
users -> {userId} -> posts -> {postId}
Root level collections
users -> {userId}
posts -> {postId}
Though if you use root level collections, you must add a userId in posts document to identify who the owner of a post is.
If you use nested documents way in MongoDB, you are likely to hit the 16 MB document limit if any of the users decides to add many posts. Similarly if the biometric_data array can have many documents, it'll be best to create another collection.
Firestore's sub-collections and documents do not count towards 1 MB max doc size of parent document but nested documents in MongoDB do.
Also checkout:
Firestore - proper NoSQL structure for user-specific data
Is mongodb sub documents equivalent to Firestore subcollections?

Many to many relations MongoDB

I'm trying to create a database using MongoDB. Since I have no experience with NoSQL database I'm having some troubles with designing the database.
I want to make a database where one student can be part of multiple sessions, and one session can contain multiple students (many-to-many). Also, each event is linked to one student inside one session.
So far I designed it like this:
Sessions:
Sessions.students = [student_id1, student_id2]
Student:
Students.sessions = [session_id1, session_id2]
But my problem is, where should I store the relation, in sessions or in student, or in both (like above)?
And is this the correct way to create the event relationships?
Event:
Event.studentid = [student_id1]
Event.sessionid = [session_id1]
1) If you need to find how many students are in each session and how many sessions each student attends.
For Many-Many relations in mongodb store id's in each other collections.
2) if one of the questions is relevant for your project then only store ids accordingly.
ex: if you need to find session information not student information then store
session.student=[studentid1, id2 id3]; student.session=[] not required
Can also use indexing for searching but dont use too much of it.
As a user you have 2 collections in the database called, students & sessions.
In students collection: each document contains an array of sessionIds & in sessions collection: each document contains array of studentIds.
To get data in list you need to use aggregation
Query for students like
students.aggregate({
//here write query using $match, $unwind, $lookup
//$unwind : sessionIds from sessions
})
Query for sessions like
sessions({
//here write query using $match, $unwind, $lookup
//$unwind : studentIds from students
})

query too large issue with mongodb

let's say we have a collection of users and each user is followed by another user. if I want to find the users that are NOT following me, I need to do something like:
db.users.find({_id: { $nin : followers_ids } } ) ;
if the amount of followers_ids is huge, let's say 100k users, mongodb will start saying the query is too large, plus sending a big amount of data over the network to make the query is not good neither. what are the best practices to accomplish this query without sending all this ids over the network ?.
I recommend that you limit the number of query Results to Reduce Network Demand. According to the Docs,
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you can reduce the demand on network resources by issuing the limit() method.
This is typically used in conjunction with sort operations. For
example, if you need only 50 results from your query to the users
collection, you would issue the following command:
db.users.find({$nin : followers_ids}).sort( { timestamp : -1 } ).limit(50)
You can then use the cursor to get retrieve more user documents as needed.
Recommendation to Restructure Followers Schema
I would recommend that you restructure your user documents if the followers will grow to a large amount. Currently user schema may be as such:
{
_id: ObjectId("123"),
username: "jobs",
email: "stevej#apple.com",
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
The good thing about the schema is whenever this user does anything all of the users you need to notify is right here inside of the document. The downside is that if you needed to find everyone a user is following you will have to query the entire users collection. Also your user document will become larger and more volatile as the followers grow.
You may want to further normalize your followers. You can keep a collection that matches followee to followers with documents that look like this:
{
_id: ObjectId("123"),//Followee's "_id"
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
This will keep your user documents slender, but will take an extra query to get the followers. As the "followers" array changes in size, you can enable the userPowerOf2Sizes allocation strategy to reduce fragmentation and moves.

Design MongoDb Schema For My Social

I'm new for MongoDB , I just want to create a simple project to test performance of MongoDB
The project just like a simple CMS
it has users, blogs and comments, users can have friends
so I design my database like that
user
{
_ID:
name:
birth_day:
sex:
friends:[id_1,Id_2]
}
blogs
{
title:
owner:
tags_fiends:
comments:
[
{"_id":"","content":"","date_created":""},
{"_id":"","content":"","date_created":""},
],
"like"={"_id","_id"}
}
And How many collection are needed for this database. Can I use 1 Collection for both user and blog.Thanks in advance.
Due to mongoDB is schema less or schema free DB You can make any kind of structure within a document, which is supported:
individual elements
nested arrays
nested documents
There is a couple of things you have to considare during schema design which for it is useful to have the users and the blogs in separated schema. For example if you storing something in a nested array you can specify index for fastening the search within this array, but you can have only one multykéy index (indexed array content) within one particular collection. so if you store, friends and blogs, and posts, and tags all in arrays you can have index only on one of them.
Also important to know in this case that there is a size limit for each document what is now 16MB.
In your scenario, I would make Users a collection and reference it by _id from the blog collection.
In practise, you could make the Blogs an attribute of User, the only constraint being the max doc size of 16MB - but that's a lot of blogs (text).
To get round that (assuming you need to), a separate Blog collection referencing the user _id would be fine. You may need to denormalise the user name too if that's not your _id. This would mean you can get all the blogs for a user in a single query.

Mongoid: retrieving documents whose _id exists in another collection

I am trying to fetch the documents from a collection based on the existence of a reference to these documents in another collection.
Let's say I have two collections Users and Courses and the models look like this:
User: {_id, name}
Course: {_id, name, user_id}
Note: this just a hypothetical example and not actual use case. So let's assume that duplicates are fine in the name field of Course. Let's thin Course as CourseRegistrations.
Here, I am maintaining a reference to User in the Course with the user_id holding the _Id of User. And note that its stored as a string.
Now I want to retrieve all users who are registered to a particular set of courses.
I know that it can be done with two queries. That is first run a query and get the users_id field from the Course collection for the set of courses. Then query the User collection by using $in and the user ids retrieved in the previous query. But this may not be good if the number of documents are in tens of thousands or more.
Is there a better way to do this in just one query?
What you are saying is a typical sql join. But thats not possible in mongodb. As you suggested already you can do that in 2 different queries.
There is one more way to handle it. Its not exactly a solution, but the valid workaround in NonSql databases. That is to store most frequently accessed fields inside the same collection.
You can store the some of the user collection fields, inside the course collection as embedded field.
Course : {
_id : 'xx',
name: 'yy'
user:{
fname : 'r',
lname :'v',
pic: 's'
}
}
This is a good approach if the subset of fields you intend to retrieve from user collection is less. You might be wondering the redundant user data stored in course collection, but that's exactly what makes mongodb powerful. Its a one time insert but your queries will be lot faster.