Currently in our system we have two separate collections, of invites, and users. So we can send an invite to someone, and that invite will have some information attached to it and is stored in the invites collection. If the user registers his account information is stored in the users collection.
Not every user has to have an invite, and not every invite has to have a user. We check if a user has an invite (or visa versa) on the email address, which in those case is stored in both collections.
Originally in our dashboard we have had a user overview, in which there is a page where you can see the current users and paginate between them.
Now we want to have one single page (and single table) in which we can view both the invites and the users and paginate through them.
Lets say our data looks like this:
invites: [
{ _id: "5af42e75583c25300caf5e5b", email: "john#doe.com", name: "John" },
{ _id: "53fbd269bde85f02007023a1", email: "jane#doe.com", name: "Jane" },
...
]
users: [
{ _id: "53fe288be081540200733892", email: "john#doe.com", firstName: "John" },
{ _id: "53fd103de08154020073388d", email: "steve#doe.com", firstName: "Steve" },
...
]
Points to note.
Some users can be matched with an invite based on the email (but that is not required)
Some invites never register
The field names are not always exactly the same
Is it possible to make a paginated list of all emails and sort on them? So if there is an email that starts with an a in collection invites, that is picked before the email that starts with a b in collection users etc. And then use offset / limit to paginate through it.
Basically, I want to "merge" the two collections in something that would be akin to a MySQL view and be able to query on that as if the entire thing was one collection.
I'm preferably looking for a solution without changing the data structure in the collection (as a projected view or something, that is fine). But parts of the code already rely on the given structure. Which in light of this new requirement might not be the best approach.
Related
I'm trying to model bidirectional friendships in MongoDB. "Bidirectional" means, like Facebook but unlike Twitter, if you're friends with Sam then Sam must also be friends with you. In MongoDB, the usually-recommended solution (example) seems to be to something like this:
Create a User collection (aka nodes to use the proper graph theory term) and a Friendship collection (aka edges)
Each User document contains an embedded friends array, each element of which contains the ObjectID of each friend. Each element can also cache read-only info about each friend (e.g. name, photo URL) that can avoid cross-document queries for common use-cases like "display my friend list".
Adding friendships involves inserting a new Friendship document and then $push-ing the Friendship's Object ID into a friends array in both users, as well as read-only cached info about friends (e.g. name) to avoid multi-document queries when displaying friend lists.
I'm considering a different design where, instead of a separate Friendship collection, edge data will be stored (duplicated) in both nodes of the bidirectional relationship. Like this:
{
_id: new ObjectID("111111111111111111111111"),
name: "Joe",
pictureUrl: "https://foo.com/joe.jpg",
invites: [
... // similar schema to friends array below
],
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"),
user1: {
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
user2: {
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
}
]
},
{
_id: new ObjectID("222222222222222222222222"),
name: "Bill",
pictureUrl: "https://foo.com/bill.jpg",
invites: [
... // similar schema to friends array below
],
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"), // shared data about the edge
user1: { // data specific to each friend
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
user2: { // data specific to each friend
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
}
]
}
Here's how I'm planning to deal with the following:
Reads - all reads for common operations happen only from individual User documents. High-level info about friends (e.g. name, picture URL) are cached inside the friends array.
Inviting a friend - add a new document to an invites array embedded into both users (not shown above) that's similar in structure and functionality to the friends collection shown above
Invitation accepted - using updateMany, add a new identical embedded document to the friends array of both users, and $pull an element from invites array of both users. Initially I'll use multi-document transactions for these updates, but because adding friendships isn't time-critical, this could be adapted to use eventual consistency.
Friendship revoked - use updateMany with a filter for {'friends.friendshipId': new ObjectID("123456789012345678901234")} to $pull the friendship subdocument from both users' friends arrays. Like above, this could use multi-document transactions initially, and eventual consistency later if needed for scale.
Updates to cached data - if a user changes info cached in friends (e.g. name or picture URL), this is an uncommon operation that can proceed slowly and one-document-at-a-time, so eventual consistency is fine.
I have two basic concerns that I'd like your advice about:
What are the problems and pitfalls with the approach described above? I know about the obvious things: extra storage, slower updates, need to add queuing and retry logic to support eventual consistency, and the risk of edge data getting out-of-sync between its two copies. I think I'm OK with these problems. But are there other, non-obvious problems that I will likely run into?
Instead of having a user1 and user2 field for each node of the edge, would it be better to use a 2-element array instead? Why or why not? Here's an example of what I mean:
friends: [
{
friendshipId: new ObjectID("123456789012345678901234"),
lastMeeting: new Date("2019-02-07T20:35:55.256+00:00"),
users: [
{
userId: new ObjectID("111111111111111111111111"),
name: "Joe", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/joe.jpg",
},
{
userId: new ObjectID("222222222222222222222222"),
name: "Bill", // cached, read-only data to avoid multi-doc reads
pictureUrl: "https://foo.com/bill.jpg",
},
],
}
]
BTW, I know that graph databases and even relational databases are better at modeling relationships compared to MongoDB. But for a variety of reasons I've settled on MongoDB for now, so please limit answers to MongoDB solutions rather than pointing me to using a graph or relational database. Thanks!
My application is about college search, there will be two kind of users, one the end users who search for colleges and other the college owners.
I am maintaining a common 'users' collection for both the kinds of users. Its schema looks like below
{
_id : ObjectId(),
display_name: String,
email: String,
password_hash: String,
type :Number
}
Now an admin of this application wants to push a notification to the users, this notification may be for all the users or for a particular user or for the owners of the colleges or for a group of users.
A notification document consists of three things like below,
{
_id : ObjectId(),
notification_message: String,
date: Date
}
Now what I want to do is to hit an api which gives me all the notifications of a particular user, along with a flag field for each notification which represents whether the notification is read or unread.
Now for this how should be my database design?
I thought of two options,
I want to maintain a notification collection and repeat the notification document for every intended user of the notification also I would maintain a field in this document as 'mark as read', the document will look like this
{
_id : ObjectId(),
notification_message: String,
date: Date,
to_user_id : ObjectId(),
mark_as_read: Boolean
}
Pros:Querying for the notifications of a particular user will be faster
Cons:For each notification 'notification' collection will increase
drastically as I am repeating the same notification in
notification collection for many users for whom the notification
is intended to.
2.I want to maintain a notification collection and for each notification document I maintain an array of users for that notification and an array of users who have read this notification
Pros: Notification collection size may be less
Cons: Notification document may hit the maximum size of 16 MB
These are only two things I could think of, is there any other better way to do this? Your help will be very much appreciated.
How about this?
{
sender: Object, // user object
receiver: [ String ] //array of user _ids
message: String, // any description of the notification message
read_by: [{ readerId: String, read_at: Date }],
created_at: Date,
}
Not quite sure what the best practice is if I have two collections, a user collection and a picture collection - I do not want to embed all my pictures into my user collection.
My client searches for pictures under a certain criteria. Let's say he gets 50 pictures back from the search (i.e. one single mongodb query). Each picture is associated to one user. I want the user name displayed as well. I assume there is no way to do a single search performance wise on the user collection returning the names of each user for each picture, i.e. I would have to do 50 searches. Which means, I could only avoid this extra performance load by duplicating data (next to the user_id, also the user_name) in my pictures collection?
Same question the other way around. If my client searches for users and say 50 users are returned from the search through one single query. If I want the last associated picture + title also displayed next to the user data, I would again have to add that to the users collection, otherwise I assume I need to do 50 queries to return the picture data?
Lets say the schema for your picture collection is as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342)
}
1) Your picture query will return documents that look like the above. You don't have to make 50 calls to get the user associated with the images. You can simply make 1 other query to the Users Collection using the user ids taken from the picture documents like such:
db.users.find({_id: {$in[userid_1,user_id2,userid_3,...,userid_n]}})
You will receive an array of user documents with the user information. You'll have to handle their display on the client afterwards. At most you'll need 2 calls.
Alternatively
You could design the schema as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342),
user_name:"user associated"
}
If you design it this way. You would only require 1 call, but the username won't be in sync with user collection documents. For example lets say a user changes their name. A picture that was saved before may have the old user name.
2) You could design your User Collection as such:
User Document
{
_id: Objectid(342),
name: "Steve jobs",
last_assoc_img: {
img_id: Object(342)
url: 'img_one',
title: 'last image title
}
}
You could use the same principles as mentioned above.
Assuming that you have a user id associated with every user and you're also storing that id in the picture document, then your user <=> picture is a loosely coupled relationship.
In order to not have to make 50 separate calls, you can use the $in operator given that you are able to pull out those ids and put them into a list to run the second query. Your query will basically be in English: "Look at the collection, if it's in the list of ids, give it back to me."
If you intend on doing this a lot and intend for it to scale, I'd either recommend using a relational database or a NoSQL database that can handle joins to not force you into an embedded document schema.
I'm trying to build an event website that will host videos and such. I've set up a collection with the event name, event description, and an object with some friendly info of people "attending". If things go well there might be 100-200k people attending, and those people should have access to whoever else is in the event. (clicking on the friendly name will find the user's id and subsequently their full profile) Is that asking too much of mongo? Or is there a better way to go about doing something like that? It seems like that could get rather large rather quick.
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :{
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url}
}
}
Thanks for the suggestions!
In MongoDB many-to-many modeling (or one-to-many) in general, you should take a different approach depending if the many are few (up to few dozens usually) or "really" many as in your case.
It will be better for you not to use embedding in your case, and instead normalize. If you embed users in your events collection, adding attendees to a certain event will increase the array size. Since documents are updated in-place, if the document can't fit it's disk size, it will have to moved on disk, a very expensive operation which will also cause fragmentation. There are few techniques to deal with moves, but none is ideal.
Having a array of ObjectId as attendees will be better in that documents will grow much less dramatically, but still issue few problems. How will you find all events user has participated in? You can have a multi-key index for attendees, but once a certain document moves, the index will have to be updated per each user entry (the index contains a pointer to the document place on disk). In your case, where you plan to have up to 200K of users it will be very very painful.
Embedding is a very cool feature of MongoDB or any other document oriented database, but it's naive to think it doesn't (sometimes) comes without a price.
I think you should really rethink your schema: having an events collection, a users collection and a user_event collection with a structure similar to this:
{
_id : ObjectId(),
user_id : ObjectId(),
event_id : ObjectId()
}
Normalization is not a dirty word
Perhaps you should consider modeling your data in two collections and your attendees field in an event document would be an array of user ids.
Here's a sample of the schema:
db.events
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :[ObjectId('userId1'), ObjectId('userId2') ...]
}
db.users
{
_id : ObjectId('userId1'),
username: 'user friendly name',
avatarLink: 'url to avatar'
}
Then you could do 2 separate queries
db.events.find({_id: ObjectId('eventId')});
db.users.find( {_id: {$in: [ObjectId['userId1'), ObjectId('userId2')]}});
In my app there are users. Each user may have many friends (other users). If user A has a friend B then user B has a friend A - always. I will have to query collection of users to get all friends of user A for example. And I will have to also use geospacial index for this query to get all friends of user A in a given radius from user A.
I have some problem when trying to "model" this structure in MongoDB.
For now I have this (in Mongoose):
{
created: { type: Date, default: Date.now },
phone_number: { type: String, unique: true },
location: { type: [Number], index: '2dsphere' },
friends: [{ phone_number: String }]
}
So each user contain array of other users phone numbers (phone number identifies each user). But I don't think it's a good idea as one user may have zero or many friends - so friends array will be mutable and may grow significantly.
What will be best option of modeling this structure?
Two approaches:
Join Collection
Similar to the relation approach where there is a collection that has documents representing friendships (essentially two object ids and possible meta-data about the relationship).
Arrays on each user
Create an array and push the object id's of the friends onto the array.
When a friendship is created you would need to modify both friends (push each friend onto the other's friend array). It would be the same for friendship dissolution.
Which one?
The join collection approach is slower as it requires multiple queries to get the friendship data as opposed to having it persisted with the user themselves (taking advantage of data locality). However, if the number of relationships grows in an unbounded fashion, the array approach is not feasible. MongoDB documents have a 16mb limit and there is a practical upper bound of 1000 or so items after which working with arrays becomes slow and unwieldy.