My application is about college search, there will be two kind of users, one the end users who search for colleges and other the college owners.
I am maintaining a common 'users' collection for both the kinds of users. Its schema looks like below
{
_id : ObjectId(),
display_name: String,
email: String,
password_hash: String,
type :Number
}
Now an admin of this application wants to push a notification to the users, this notification may be for all the users or for a particular user or for the owners of the colleges or for a group of users.
A notification document consists of three things like below,
{
_id : ObjectId(),
notification_message: String,
date: Date
}
Now what I want to do is to hit an api which gives me all the notifications of a particular user, along with a flag field for each notification which represents whether the notification is read or unread.
Now for this how should be my database design?
I thought of two options,
I want to maintain a notification collection and repeat the notification document for every intended user of the notification also I would maintain a field in this document as 'mark as read', the document will look like this
{
_id : ObjectId(),
notification_message: String,
date: Date,
to_user_id : ObjectId(),
mark_as_read: Boolean
}
Pros:Querying for the notifications of a particular user will be faster
Cons:For each notification 'notification' collection will increase
drastically as I am repeating the same notification in
notification collection for many users for whom the notification
is intended to.
2.I want to maintain a notification collection and for each notification document I maintain an array of users for that notification and an array of users who have read this notification
Pros: Notification collection size may be less
Cons: Notification document may hit the maximum size of 16 MB
These are only two things I could think of, is there any other better way to do this? Your help will be very much appreciated.
How about this?
{
sender: Object, // user object
receiver: [ String ] //array of user _ids
message: String, // any description of the notification message
read_by: [{ readerId: String, read_at: Date }],
created_at: Date,
}
Related
Currently in our system we have two separate collections, of invites, and users. So we can send an invite to someone, and that invite will have some information attached to it and is stored in the invites collection. If the user registers his account information is stored in the users collection.
Not every user has to have an invite, and not every invite has to have a user. We check if a user has an invite (or visa versa) on the email address, which in those case is stored in both collections.
Originally in our dashboard we have had a user overview, in which there is a page where you can see the current users and paginate between them.
Now we want to have one single page (and single table) in which we can view both the invites and the users and paginate through them.
Lets say our data looks like this:
invites: [
{ _id: "5af42e75583c25300caf5e5b", email: "john#doe.com", name: "John" },
{ _id: "53fbd269bde85f02007023a1", email: "jane#doe.com", name: "Jane" },
...
]
users: [
{ _id: "53fe288be081540200733892", email: "john#doe.com", firstName: "John" },
{ _id: "53fd103de08154020073388d", email: "steve#doe.com", firstName: "Steve" },
...
]
Points to note.
Some users can be matched with an invite based on the email (but that is not required)
Some invites never register
The field names are not always exactly the same
Is it possible to make a paginated list of all emails and sort on them? So if there is an email that starts with an a in collection invites, that is picked before the email that starts with a b in collection users etc. And then use offset / limit to paginate through it.
Basically, I want to "merge" the two collections in something that would be akin to a MySQL view and be able to query on that as if the entire thing was one collection.
I'm preferably looking for a solution without changing the data structure in the collection (as a projected view or something, that is fine). But parts of the code already rely on the given structure. Which in light of this new requirement might not be the best approach.
I'm trying to build an event website that will host videos and such. I've set up a collection with the event name, event description, and an object with some friendly info of people "attending". If things go well there might be 100-200k people attending, and those people should have access to whoever else is in the event. (clicking on the friendly name will find the user's id and subsequently their full profile) Is that asking too much of mongo? Or is there a better way to go about doing something like that? It seems like that could get rather large rather quick.
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :{
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url}
}
}
Thanks for the suggestions!
In MongoDB many-to-many modeling (or one-to-many) in general, you should take a different approach depending if the many are few (up to few dozens usually) or "really" many as in your case.
It will be better for you not to use embedding in your case, and instead normalize. If you embed users in your events collection, adding attendees to a certain event will increase the array size. Since documents are updated in-place, if the document can't fit it's disk size, it will have to moved on disk, a very expensive operation which will also cause fragmentation. There are few techniques to deal with moves, but none is ideal.
Having a array of ObjectId as attendees will be better in that documents will grow much less dramatically, but still issue few problems. How will you find all events user has participated in? You can have a multi-key index for attendees, but once a certain document moves, the index will have to be updated per each user entry (the index contains a pointer to the document place on disk). In your case, where you plan to have up to 200K of users it will be very very painful.
Embedding is a very cool feature of MongoDB or any other document oriented database, but it's naive to think it doesn't (sometimes) comes without a price.
I think you should really rethink your schema: having an events collection, a users collection and a user_event collection with a structure similar to this:
{
_id : ObjectId(),
user_id : ObjectId(),
event_id : ObjectId()
}
Normalization is not a dirty word
Perhaps you should consider modeling your data in two collections and your attendees field in an event document would be an array of user ids.
Here's a sample of the schema:
db.events
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :[ObjectId('userId1'), ObjectId('userId2') ...]
}
db.users
{
_id : ObjectId('userId1'),
username: 'user friendly name',
avatarLink: 'url to avatar'
}
Then you could do 2 separate queries
db.events.find({_id: ObjectId('eventId')});
db.users.find( {_id: {$in: [ObjectId['userId1'), ObjectId('userId2')]}});
In my app there are users. Each user may have many friends (other users). If user A has a friend B then user B has a friend A - always. I will have to query collection of users to get all friends of user A for example. And I will have to also use geospacial index for this query to get all friends of user A in a given radius from user A.
I have some problem when trying to "model" this structure in MongoDB.
For now I have this (in Mongoose):
{
created: { type: Date, default: Date.now },
phone_number: { type: String, unique: true },
location: { type: [Number], index: '2dsphere' },
friends: [{ phone_number: String }]
}
So each user contain array of other users phone numbers (phone number identifies each user). But I don't think it's a good idea as one user may have zero or many friends - so friends array will be mutable and may grow significantly.
What will be best option of modeling this structure?
Two approaches:
Join Collection
Similar to the relation approach where there is a collection that has documents representing friendships (essentially two object ids and possible meta-data about the relationship).
Arrays on each user
Create an array and push the object id's of the friends onto the array.
When a friendship is created you would need to modify both friends (push each friend onto the other's friend array). It would be the same for friendship dissolution.
Which one?
The join collection approach is slower as it requires multiple queries to get the friendship data as opposed to having it persisted with the user themselves (taking advantage of data locality). However, if the number of relationships grows in an unbounded fashion, the array approach is not feasible. MongoDB documents have a 16mb limit and there is a practical upper bound of 1000 or so items after which working with arrays becomes slow and unwieldy.
I am designing a news feed for a blog site. I am trying to design the feed so blogs with recent activity from your friends keeps those blogs on the top of your feed while feeds you have no participation in fall towards the bottom of the list. Basically, think of your Facebook feed but for Blogs.
Here is the current design I have but I'm open to suggestions to make this easier to select from:
{
_id: 1,
author: {first: "John", last: "Doe", id: 123},
title: "This is a test post.",
body: "This is the body of my post."
date: new Date("Feb 1, 2013"),
edited: new Date("Feb 2, 2013"),
comments: [
{
author: {first: "Jane", last: "Doe", id: 124},
date: new Date("Feb 2, 2013"),
comment: "Awesome post."
},
],
likes: [
{
who: {first: "Black", last: "Smith", id: 125},
when: new Date("Feb 3, 2013")
}
],
tagged: [
{
who: {first: "Black", last: "Smith", id: 126},
when: new Date("Feb 4, 2013")
}
]}
Question 1: Assuming my friends have the ids 124 and 125, how do I select the feed so that the order of this post in the results is by them, not by user 126 that was tagged in the feed later.
Question 2: Is this single collection of blogs a good design or should I normalize actions into a separate collection?
So this document you show represents one blog post and those are the comments, tags, likes, etc? If that's the case this isn't too bad.
1.
db.posts.find({'$or':[{'comments.author.id':{$in:[some list of friends]}}, {'likes.who.id':{$in:[some list of friends]}}, {'tagged.who.id':{$in:[some list of friends]}}]}).sort({date:-1})
This will give you the posts all your friends have activity on sorted by the post's date descending. I don't think mongodb yet supports advanced sorting (like the min/max of the dates in comments, likes or tags) so sorting by either one of comments, likes or tags or sorting on post date is your best bet with this model.
2.
Personally, I would setup a separate collection for dumping a user's feed events into. Then as events happen, just push the event into the array of events in the document.
They will automatically be sorted and you can just slice the array and cap it as needed.
However with documents that grow like that you need to be careful and allocate an initial sizable amount of memory or you will encounter slow document moves on disk.
See the blurb on updates
Edit additional comments:
There are two ways to do it. Either a collection where every document is a feed event or where every document is the user's entire feed. Each has advantages and disadvantages. If you are ok with capping it at say 1000 recent feed events I would use the document to represent an entire feed method.
So I would create a document structure like
{userid:1, feed:[(feed objects)]}
where feed is an array of feed event objects. These should be subdocuments like
{id:(a users id), name:(a users name), type:(an int for like/comment/tag), date:(some iso date), postName:(the name of the post acted on), postId:(the id of the post acted on)}
To update this feed you just need to push a new feed document onto the feed array when the feed event happens. So if user A likes a post, push the feed document onto all of user A's friends feeds.
This works well for small feeds. If you need a very large feed I would recommend using a document per feed entry and sharding off of the recipient user's id and indexing the date field. This is getting closer to how the very very large feeds at twitter/fb work but they use mysql which is arguably better than mongodb for this specific use case.
Suppose you have a large number of users (M) and a large number of documents (N) and you want each user to be able to mark each document as read or unread (just like any email system). What's the best way to represent this in MongoDB? Or any other document database?
There are several questions on StackOverflow asking this question for relational databases but I didn't see any with recommendations for document databases:
What's the most efficient way to remember read/unread status across multiple items?
Implementing an efficient system of "unread comments" counters
Typically the answers involve a table listing everything a user has read: (i.e. tuples of user id, document id) with some possible optimizations for a cut off date allowing mark-all-as-read to wipe the database and start again knowing that anything prior to that date is 'read'.
So, MongoDB / NOSQL experts, what approaches have you seen in practice to this problem and how did they perform?
{
_id: messagePrefs_uniqueId,
type: 'prefs',
timestamp: unix_timestamp
ownerId: receipientId,
messageId: messageId,
read: true / false,
}
{
_id: message_uniqueId,
timestamp: unix_timestamp
type: 'message',
contents: 'this is the message',
senderId: senderId,
recipients: [receipientId1,receipientId2]
}
Say you have 3 messages you want to retrieve preferences for, you can get them via something like:
db.messages.find({
messageId : { $in : [messageId1,messageId2,messageId3]},
ownerId: receipientId,
type:'prefs'
})
If all you need is read/unread you could use this with MongoDB's upsert capabilities, so you are not creating prefs for each message unless the user actually reads it, then basically you create the prefs object with your own unique id and upsert it into MongoDB. If you want more flexibility(like say tags or folders) you'll probably want to make the pref for each recipient of the message. For example you could add:
tags: ['inbox','tech stuff']
to the prefs object and then to get all the prefs of all the messages tagged with 'tech stuff' you'd go something like:
db.messages.find({type: 'prefs', ownerId: recipientId, tags: 'tech stuff'})
You could then use the messageIds you find within the prefs to query and find all the messages that correspond:
db.messages.find((type:'message', _id: { $in : [array of messageIds from prefs]}})
It might be a little tricky if you want to do something like counting how many messages each 'tag' contains efficiently. If it's only a handful of tags you can just add .count() to the end of your query for each query. If it's hundreds or thousands then you might do better with a map/reduce server side script or maybe an object that keeps track of message counts per tag per user.
If you're only storing a simple boolean value, like read/unread, another method is to embedded an array in each Document that contains a list of the Users who have read it.
{
_id: 'document#42',
...
read_by: ['user#83', 'user#2702']
}
You should then be able to index that field, making for fast queries for Documents-read-by-User and Users-who-read-Document.
db.documents.find({read_by: 'user#83'})
db.documents.find({_id: 'document#42}, {read_by: 1})
However, I find that I'm usually querying for all Documents that have not been read by a particular User, and I can't think of any solution that can make use of the index in this case. I suspect it's not possible to make this fast without having both read_by and unread_by arrays, so that every User is included in every Document (or join table), but that would have a large storage cost.