How to model recurency and hasMany relationship in MongoDB? - mongodb

In my app there are users. Each user may have many friends (other users). If user A has a friend B then user B has a friend A - always. I will have to query collection of users to get all friends of user A for example. And I will have to also use geospacial index for this query to get all friends of user A in a given radius from user A.
I have some problem when trying to "model" this structure in MongoDB.
For now I have this (in Mongoose):
{
created: { type: Date, default: Date.now },
phone_number: { type: String, unique: true },
location: { type: [Number], index: '2dsphere' },
friends: [{ phone_number: String }]
}
So each user contain array of other users phone numbers (phone number identifies each user). But I don't think it's a good idea as one user may have zero or many friends - so friends array will be mutable and may grow significantly.
What will be best option of modeling this structure?

Two approaches:
Join Collection
Similar to the relation approach where there is a collection that has documents representing friendships (essentially two object ids and possible meta-data about the relationship).
Arrays on each user
Create an array and push the object id's of the friends onto the array.
When a friendship is created you would need to modify both friends (push each friend onto the other's friend array). It would be the same for friendship dissolution.
Which one?
The join collection approach is slower as it requires multiple queries to get the friendship data as opposed to having it persisted with the user themselves (taking advantage of data locality). However, if the number of relationships grows in an unbounded fashion, the array approach is not feasible. MongoDB documents have a 16mb limit and there is a practical upper bound of 1000 or so items after which working with arrays becomes slow and unwieldy.

Related

Mongo: Two collections with pagination (in single list in html)

Currently in our system we have two separate collections, of invites, and users. So we can send an invite to someone, and that invite will have some information attached to it and is stored in the invites collection. If the user registers his account information is stored in the users collection.
Not every user has to have an invite, and not every invite has to have a user. We check if a user has an invite (or visa versa) on the email address, which in those case is stored in both collections.
Originally in our dashboard we have had a user overview, in which there is a page where you can see the current users and paginate between them.
Now we want to have one single page (and single table) in which we can view both the invites and the users and paginate through them.
Lets say our data looks like this:
invites: [
{ _id: "5af42e75583c25300caf5e5b", email: "john#doe.com", name: "John" },
{ _id: "53fbd269bde85f02007023a1", email: "jane#doe.com", name: "Jane" },
...
]
users: [
{ _id: "53fe288be081540200733892", email: "john#doe.com", firstName: "John" },
{ _id: "53fd103de08154020073388d", email: "steve#doe.com", firstName: "Steve" },
...
]
Points to note.
Some users can be matched with an invite based on the email (but that is not required)
Some invites never register
The field names are not always exactly the same
Is it possible to make a paginated list of all emails and sort on them? So if there is an email that starts with an a in collection invites, that is picked before the email that starts with a b in collection users etc. And then use offset / limit to paginate through it.
Basically, I want to "merge" the two collections in something that would be akin to a MySQL view and be able to query on that as if the entire thing was one collection.
I'm preferably looking for a solution without changing the data structure in the collection (as a projected view or something, that is fine). But parts of the code already rely on the given structure. Which in light of this new requirement might not be the best approach.

Schema on mongodb for reducing API calls with two collections

Not quite sure what the best practice is if I have two collections, a user collection and a picture collection - I do not want to embed all my pictures into my user collection.
My client searches for pictures under a certain criteria. Let's say he gets 50 pictures back from the search (i.e. one single mongodb query). Each picture is associated to one user. I want the user name displayed as well. I assume there is no way to do a single search performance wise on the user collection returning the names of each user for each picture, i.e. I would have to do 50 searches. Which means, I could only avoid this extra performance load by duplicating data (next to the user_id, also the user_name) in my pictures collection?
Same question the other way around. If my client searches for users and say 50 users are returned from the search through one single query. If I want the last associated picture + title also displayed next to the user data, I would again have to add that to the users collection, otherwise I assume I need to do 50 queries to return the picture data?
Lets say the schema for your picture collection is as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342)
}
1) Your picture query will return documents that look like the above. You don't have to make 50 calls to get the user associated with the images. You can simply make 1 other query to the Users Collection using the user ids taken from the picture documents like such:
db.users.find({_id: {$in[userid_1,user_id2,userid_3,...,userid_n]}})
You will receive an array of user documents with the user information. You'll have to handle their display on the client afterwards. At most you'll need 2 calls.
Alternatively
You could design the schema as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342),
user_name:"user associated"
}
If you design it this way. You would only require 1 call, but the username won't be in sync with user collection documents. For example lets say a user changes their name. A picture that was saved before may have the old user name.
2) You could design your User Collection as such:
User Document
{
_id: Objectid(342),
name: "Steve jobs",
last_assoc_img: {
img_id: Object(342)
url: 'img_one',
title: 'last image title
}
}
You could use the same principles as mentioned above.
Assuming that you have a user id associated with every user and you're also storing that id in the picture document, then your user <=> picture is a loosely coupled relationship.
In order to not have to make 50 separate calls, you can use the $in operator given that you are able to pull out those ids and put them into a list to run the second query. Your query will basically be in English: "Look at the collection, if it's in the list of ids, give it back to me."
If you intend on doing this a lot and intend for it to scale, I'd either recommend using a relational database or a NoSQL database that can handle joins to not force you into an embedded document schema.

Find a set of documents in collection A, based on an array from collection B

The following data needs to be stored in MongoDb:
A collection of persons (approximately 100-2000) together with their relevant attributes.
Another collection of queues (approximately 5-50).
Information about the relationsship between persons and queues. Each person can stand in line in several queues, and each queue can hold several persons. The order of the persons waiting in a queue is important.
Currently this is what i have in mind:
Persons:
{
_id: ObjectId("507c35dd8fada716c89d0001"),
first_name: 'john',
email: 'john.doe#doe.com'
id_number: 8101011234,
...
},
Queues:
{
_id: ObjectId("507c35dd8fada716c89d0011"),
title: 'A title for this queue',
people_waiting: [
ObjectId("507c35dd8fada716c89d0001"),
ObjectId("507c35dd8fada716c89d0002"),
ObjectId("507c35dd8fada716c89d0003"),
...
]
},
In a web page, I want to list (in order) all persons standing in a certain queue. I'm thinking that I first need to query the 'people_waiting' array from the 'Queues' collection. And then loop trough this array and for each item query it from the 'Persons' collection.
But there seems to be a lot of queries to generate this list, and i wonder if there is a smarter way to write/combine queries than the way described above.
You can only query one collection at a time in MongoDB, so it does take two queries. But you can use $in instead of looping through array and querying each person individually.
In the shell:
queue = db.Queues.findOne({_id: idOfQueue});
peopleWaiting = db.Persions.find({_id: {$in: queue.people_waiting}}).toArray();
But peopleWaiting will not be sorted by the order of the ids in the queue and there's no support for doing that in a MongoDB query. So you'd have to reorder peopleWaiting in your code to match the order in queue.people_waiting.

MongoDB document model size/performance limits? A collection with an object that possibly houses 100k+ names?

I'm trying to build an event website that will host videos and such. I've set up a collection with the event name, event description, and an object with some friendly info of people "attending". If things go well there might be 100-200k people attending, and those people should have access to whoever else is in the event. (clicking on the friendly name will find the user's id and subsequently their full profile) Is that asking too much of mongo? Or is there a better way to go about doing something like that? It seems like that could get rather large rather quick.
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :{
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url}
}
}
Thanks for the suggestions!
In MongoDB many-to-many modeling (or one-to-many) in general, you should take a different approach depending if the many are few (up to few dozens usually) or "really" many as in your case.
It will be better for you not to use embedding in your case, and instead normalize. If you embed users in your events collection, adding attendees to a certain event will increase the array size. Since documents are updated in-place, if the document can't fit it's disk size, it will have to moved on disk, a very expensive operation which will also cause fragmentation. There are few techniques to deal with moves, but none is ideal.
Having a array of ObjectId as attendees will be better in that documents will grow much less dramatically, but still issue few problems. How will you find all events user has participated in? You can have a multi-key index for attendees, but once a certain document moves, the index will have to be updated per each user entry (the index contains a pointer to the document place on disk). In your case, where you plan to have up to 200K of users it will be very very painful.
Embedding is a very cool feature of MongoDB or any other document oriented database, but it's naive to think it doesn't (sometimes) comes without a price.
I think you should really rethink your schema: having an events collection, a users collection and a user_event collection with a structure similar to this:
{
_id : ObjectId(),
user_id : ObjectId(),
event_id : ObjectId()
}
Normalization is not a dirty word
Perhaps you should consider modeling your data in two collections and your attendees field in an event document would be an array of user ids.
Here's a sample of the schema:
db.events
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :[ObjectId('userId1'), ObjectId('userId2') ...]
}
db.users
{
_id : ObjectId('userId1'),
username: 'user friendly name',
avatarLink: 'url to avatar'
}
Then you could do 2 separate queries
db.events.find({_id: ObjectId('eventId')});
db.users.find( {_id: {$in: [ObjectId['userId1'), ObjectId('userId2')]}});

How should the following many to many relationship be modeled in MongoDB?

Suppose I have Student and Teacher in a many to many relationship. If I just want to find out all the teachers for a given student or vice versa I generally model it by using embedded Object Ids. For example if teacher has a property studentIds which is an array of student Object Ids then that is enough information to do all the queries you need.
However suppose that a student can give a teacher a rating. How should this rating fit into the model? At the moment I do the following:
Inside teacher instead of storing an array of student, I store an
array of json objects {studentId: ObjectId, rating: String}
When doing the query, I transform the array of json objects into an array
of studentIds and extract the full information as usual
So now I have an array of student objects and an array of json objects with
the ratings
However, since the $in operator in MongoDB does not preserve ordering, I need to do my own sorting
At the last step, with everything in order I can combine student objects with ratings
to get what I want
It works but seems somewhat convoluted is there a better way of doing this?
Here are some considerations. In the end, it depends on your requirements:
Rating is optional, right?
If so, ask yourself whether you want to combine a required feature (storing teacher/student association) with a nice-to-have one. Code that implements a nice-to-have feature now writes to your most important collection. I think you can improve separation of concerns in your code with a different db schema.
Will you need more features?
Let's say you want to provide students with a list of ratings they gave, the average rating a student has given to teachers, and you want to show a development of ratings over time. This will be very messy with embedded documents. Embedded documents are less flexible.
If you need top read performance, you need to denormalize more data
If you want to stick to the embedded documents, you might want to copy more data. Let's say there's an overview of ratings per teacher where you can see the students' names. It would be helpful to embed an object
{ studentId : ObjectId,
rating: string,
studentName: string,
created: dateTime }
As alternatives, consider
TeacherRating {
StudentId: id
TeacherId: id
Rating: number
Created: DateTime
}
Teacher/student association will still be stored in the teacher object, but the ratings are in a different collections. A rating can't be created if no association between teacher and student can be found.
or
TeacherStudentClass {
StudentId: id
TeacherId: id
Class: id
ClassName: string // (denormalized, just an example)
Rating: number // (optional)
Created: DateTime
}
To find all students for a given teacher, you'd have to query the linker document first, then do a $in query on the students, and vice versa. That is one more query, but it comes with a huge gain in flexibility.