Apply limit to each condition in an $or query? - mongodb

I have a chatroom application written in Meteor, using MongoDB. Each chatroom contains many messages, and a user can join multiple chatrooms. I'd like to create a query that fetches the 200 most recent messages from each chatroom for all the chatrooms that a given user is in. I'm doing something like this:
// These are the ids of the chatrooms the user is currently in
var conditions = [{chatroomId: 1}, {chatroomId: 2}];
Messages.find({$or: conditions}, {sort: {createdAt: -1}, limit: 200});
However, naturally, this limit applies to the entire query; so the user might end up with 180 messages from one room and 20 from another. Worse, as new messages are added, it's inconsistent which room has old messages culled away.
I could increase the limit as the user joins more chatrooms, but that could lead to them having 5 messages in two chatrooms and 590 in the third.
Is there a way to apply a slice or limit to each condition in the or? I'm publishing this as a Meteor publication, so I need to return a single cursor.

You can't apply a limit to only a portion of the result set, so the only way to accomplish this is with multiple subscriptions. Here's some sample code using template subscriptions. You'll need to modify it for your specific use case.
server
// Returns a limited set of messages for one specific chat room.
Meteor.publish('messages', function(chatroomId, limit) {
check(chatroomId, String);
check(limit, Number);
var options = {sort: {createdAt: -1}, limit: limit};
return Messages.find({chatroomId: chatroomId}, options);
});
client
Template.myTemplate.helpers({
messages: function() {
// Read all of the messages (you may want to restrict these by room),
// sort them, and return the first 100 across all.
return _.chain(Messages.find().fetch())
.sortBy(function(message){return -message.createdAt.getTime();})
.first(100)
.value();
}
});
Template.myTemplate.onCreated(function() {
// Subscribe to multiple chat rooms - here we are assuming the ids are
// stored in the current context - modify this as needed for your use case.
this.subscribe('messages', this.chatroomId1, 100);
this.subscribe('messages', this.chatroomId2, 100);
});

I believe that simply having a dynamic number of .find()s, one for each room, would address the situation.

Related

Mongoose prevent duplicate id numbers based on document.count()

My documents all have sequential numbers, saved as a String as an ID (it's padded with 0s). When creating a new record, I first do a request for Comment.count(). Using the number returned from that, I generate the ID string. I then create an object, and save it as a new document.
var commentNumber = (result[1] + 1).toString().padStart(4, '0');
var newComment = this({
html: processedHtml,
number: commentNumber
});
newComment.save(function(err, result) {
if (err) return callback(err);
return callback(null, result);
});
The problem is, if two comments are submitted at the same time, they will get the same ID (this happens if I make 2 requests on submission instead of 1, they will both have the same ID).
How can I prevent this?
One simple option would be to create a unique index on number so that one of the requests fails.
Another would be to store the current number count elsewhere. If you wanted to use mongo, you could have a doc with commentCount in a different collection & do a findAndUpdate with $inc and use the returned value. This still leads to a weird race condition where a user might only see comments 1 and 3 when comment 2 takes longer to create than comment 3.
I think the approach of storing the comment number on the document is fundamentally flawed: it creates weird race conditions, strange error handling, and complex deletes. If possible, it's better to calculate the number of comments on the way out.
As far as ordering goes, mongo _ids encode date-time information at the start of the _id, so you can use the _id to sort documents.

Consolidating collections for a time-line type view

Given an Meteor application that has multiple collections that need to be displayed together in a paged Facebook-style timeline view, I'm trying to decide on the best way to handle the publication of this data.
The requirements are as follows:
Documents from different collections may be intermingled in the timeline view.
The items should be sorted by a common field (the date, for example)
There should be a paged-display limit with a "Load More..." button
To solve this problem I can see two possible approaches...
Approach 1 - Overpublish
Currently I have different collections for each type of data. This poses a problem for the efficient publishing of the information that I need. For example, if the current display limit is 100 then I need to publish 100 elements of each type of collection in order to be sure of displaying the latest 100 elements of the screen.
An example may make this clearer. Assume that the timeline display shows results from collections A, B, C and D. Potentially only one of those collections may have any data, so to be sure that I have enough data to display 100 items I'll need to fetch 100 items from each collection. In that case, however, I could be fetching and sending 400 items instead!
That's really not good at all.
Then, on the client side, I need to handling merging these collections such that I show the documents in order, which probably isn't a trivial task.
Approach 2 - Combine all the collections
The second approach that occurs to me it to have one enormous server side collection of generic objects. That is, instead of having collections A, B, C, and D, I'd instead have a master collection M with a type field that describes the type of data held by the document.
This would allow me to trivially retrieve the the latest documents without over publishing.
However I'm not yet sure what the full repercussions of this approach would be, especially with packages such as aldeed:autoform and aldeed:simple-schema.
My questions are:
Does anyone here have and experience with these two approaches? If
so, what other issues should I be aware of?
Can anyone here suggest
an alternative approach?
I'd use the second approach, but do not put everything in there...
What I mean is that, for your timeline you need events, so you'd create an events collection that stores the basic information for each event (date, owner_id, etc) you'd also add the type of event and id to match another collection. So you'll keep your events just small enough to publish all is needed to then grab more details if there is a need.
You could then, either just publish your events, or publish the cursors of the other collections at the same time using the _id's to not over-publish. That event collection will become very handy for matching documents like if the user wants to see what in his timeline is related to user X or city Y...
I hope it helps you out.
I finally come up with a completely different approach.
I've created a server publication that returns the list of items ids and types to be displayed. The client can then fetch these from the relevant collections.
This allows me to maintain separate collections for each type, thus avoiding issues related to trying to maintain a Master collection type. Our data-model integrity is preserved.
At the same time I don't have to over-publish the data to the client. The workload on the server to calculate the ID list is minimal, and outweighs the disadvantages of the other two approaches by quite a long way in my opinion.
The basic publication looks like this (in Coffeescript):
Meteor.publish 'timeline', (options, limit) ->
check options, Object
check limit, Match.Optional Number
sub = this
limit = Math.min limit ? 10, 200
# We use the peerlibrary:reactive-mongo to enable meteor reactivity on the server
#ids = {}
tracker = Tracker.autorun =>
# Run a find operation on the collections that can be displayed in the timeline,
# and add the ids to an array
collections = ['A', 'B']
items = []
for collectionName in collections
collection = Mongo.Collection.get collectionName
collection.find({}, { fields: { updatedOn: 1 }, limit: limit, sort: { updatedOn: -1 }}).forEach (item) ->
item.collection = collectionName
items.push item
# Sort the array and crop it to the required length
items = items.sort (a,b) -> new Date(a.date) - new Date(b.date)
items = items[0...limit]
newIds = {}
# Add/Remove the ids from the 'timeline' collection
for doc in items
id = doc._id
newIds[id] = true
# Add this id to the publication if we didn't have it before
if not #ids[id]?
#ids[id] = moment doc.updatedOn
sub.added 'timeline', id, { collection: doc.collection, docId: id, updatedOn: doc.updatedOn }
# If the update time has changed then it needs republishing
else if not moment(doc.updatedOn).isSame #ids[id]
#ids[id] = doc.updatedOn
sub.changed 'timeline', id, { collection: doc.collection, docId: id, updatedOn: doc.updatedOn }
# Check for items that are no longer in the result
for id of #ids
if not newIds[id]?
sub.removed 'timeline', id
delete #ids[id]
sub.onStop ->
tracker.stop()
sub.ready()
Note that I'm using peerlibrary:reactive-publish for the server-side autorun.
The queries fetch just the latest ids from each collection, then it places them into a single array, sorts them by date and crops the array length to the current limit.
The resulting ids are then added to the timeline collection, which provides for a reactive solution on the client.
On the client it's a simply a matter of subscripting to this collection, then subscribing the individual item subscriptions themselves. Something like this:
Template.timelinePage.onCreated ->
#autorun =>
#limit = parseInt(Router.current().params['limit']) || 10
sub = #subscribe 'timeline', {}, #limit
if sub.ready()
items = Timeline.find().fetch()
As = _.pluck _.where(items, { collection: 'a' }), 'docId'
#aSub = #subscribe 'a', { _id: { $in: As }}
Bs = _.pluck _.where(items, { collection: 'b' }), 'docId'
#bSub = #subscribe 'b', { _id: { $in: Bs }}
Finally, the template can iterate one the timeline subscription and display the appropriate item based on its type.

Limiting documents in MongoDB dynamically

I'm looking to limit the number of documents in a collection dynamically. For example, I will store upto 1000 documents for GOLD members, and 500 for SILVER members. When the maximum is reached for a particular user and a new document is created by them, the oldest document belonging to that user should be deleted.
Capped collections aren't suitable for something like this so I was wondering if it is up to me to add my own logic to implement this "queue type" feature or whether there is some tried and tested approach out there somewhere?
Use a pre save middleware that performs the "capping" on every save.
schema.pre('save', function(next) {
model.find({}, {
sort: { $natural: -1 }, // sort in reverse
limit: 50, // to get last 50 docs
}).remove();
next();
});

mongodb limit in the embedded document

I need to create a message system, where a person can have a conversation with many users.
For example I start to speak with user2, user3 and user4, so anyone of them can see the whole conversation, and if the conversation is not private at any point of time any of participants can add any other person to the conversation.
Here is my idea how to do this.
I am using Mongo and my idea is to use dialog as an instance instead of message.
The schema is listed as follows:
{
_id : ...., // dialog Id
'private' : 0 // is the conversation private
'participants' : [1, 3, 5, 6], //people who are in the conversation
'msgs' :[
{
'mid' : ...// id of a message
'pid': 1, // person who wrote a message
'msg' : 'tafasd' //message
},
....
{
'mid' : ...// id of a message
'pid': 1, // person who wrote a message
'msg' : 'tafasd' //message
}
]
}
I can see some pros for this approach
- in a big database it will be easy to find messages for some particular conversation.
- it will be easy to add people to the conversation.
but here is a problem, for which I can't find a solution:
the conversation is becoming too long (take skype as an example) and they are not showing you all the conversation, they are showing you a part and afterwards they are showing you additional messages.
In other situations skip, limit solves the case, but how can I do this here?
If this is impossible what suggestions do you have?
The MongoDB docs explain how to select a subrange of an array element.
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: 5}}) // first 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: -5}}) // last 5 comments
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [20, 10]}}) // skip 20, limit 10
db.dialogs.find({"_id": [dialogId]}, {msgs:{$slice: [-20, 10]}}) // 20 from end, limit 10
You can use this technique to only select the messages that are relevant to your UI. However, I'm not sure that this is a good schema design. You may want to consider separating out "visible" messages from "archived" messages. It might make the querying a bit easier/faster.
There are caveats if your conversation will have many many messages:
You will notice significant performance reduction on slicing messages arrays as mongodb will do load all of them and will slice the list before return to driver only.
There is document size limit (16MB for now) that could be possibly reached by this approach.
My suggestions is:
Use two collections: one for conversations and the other for messages.
Use dbref in messages to conversation (index this field with the message timestamp to be able to select older ranges on user request).
Additional use separate capped collection for every conversation. It will be easy to find it by name if you build it like "conversation_"
Result:
You will have to write all messages twice. But into separate collections which is normal.
When you want to show your conversation you will need just to select all the data from one collection in natural sort order which is very fast.
Your capped collections will automatically store last messages and delete old.
You may show older messages on the user request by querying main messages collection.

MongoDB: Calling Count() vs tracking counts in a collection

I am moving our messaging system to MongoDB and am curious what approach to take with respect to various stats, like number of messages per user etc. In MS SQL database I have a table where I have different counts per user and they get updated by trigger on corresponding tables, so I can for example know how many unread messages UserA has without calling an expensive SELECT Count(*) operation.
Is count function in MongoDB also expensive?
I started reading about map/reduce but my site is high load, so statistics has to update in real time, and my understanding is that map/reduce is time consuming operation.
What would be the best (performance-wise) approach on gathering various aggregate counts in MongoDB?
If you've got a lot of data, then I'd stick with the same approach and increment an aggregate counter whenever a new message is added for a user, using a collection something like this:
counts
{
userid: 123,
messages: 10
}
Unfortunately (or fortunately?) there are no triggers in MongoDB, so you'd increment the counter from your application logic:
db.counts.update( { userid: 123 }, { $inc: { messages: 1 } } )
This'll give you the best performance, and you'd probably also put an index on the userid field for fast lookups:
db.counts.ensureIndex( { userid: 1 } )
Mongodb good fit for the data denormaliztion. And if your site is high load then you need to precalculate almost everything, so use $inc for incrementing messages count, no doubt.