Limiting documents in MongoDB dynamically

Limiting documents in MongoDB dynamically - mongodb

I'm looking to limit the number of documents in a collection dynamically. For example, I will store upto 1000 documents for GOLD members, and 500 for SILVER members. When the maximum is reached for a particular user and a new document is created by them, the oldest document belonging to that user should be deleted.
Capped collections aren't suitable for something like this so I was wondering if it is up to me to add my own logic to implement this "queue type" feature or whether there is some tried and tested approach out there somewhere?

Use a pre save middleware that performs the "capping" on every save.
schema.pre('save', function(next) {
model.find({}, {
sort: { $natural: -1 }, // sort in reverse
limit: 50, // to get last 50 docs
}).remove();
next();
});

Related

MongoDB - Preserve order when using sample

I want to have pagination and sampling.
I have a client that requests entities in a random order.
I want to implement a new feature - pagination.
The only way I can think of is to use match to ensure the entities to be served do not have ID's of the entities already served, but this sounds like an expensive operation.
Is there a way to randomise the order of entities in a collection, and save it in a way? So that I can use skip on it?

How about using a combination of $sample and $out?
db.collection.aggregate({
$sample: { size: 300 } // get 300 random documents
}, {
$out: "temp" // write results to a new "temp" collection
})

query too large issue with mongodb

let's say we have a collection of users and each user is followed by another user. if I want to find the users that are NOT following me, I need to do something like:
db.users.find({_id: { $nin : followers_ids } } ) ;
if the amount of followers_ids is huge, let's say 100k users, mongodb will start saying the query is too large, plus sending a big amount of data over the network to make the query is not good neither. what are the best practices to accomplish this query without sending all this ids over the network ?.

I recommend that you limit the number of query Results to Reduce Network Demand. According to the Docs,
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you can reduce the demand on network resources by issuing the limit() method.
This is typically used in conjunction with sort operations. For
example, if you need only 50 results from your query to the users
collection, you would issue the following command:
db.users.find({$nin : followers_ids}).sort( { timestamp : -1 } ).limit(50)
You can then use the cursor to get retrieve more user documents as needed.
Recommendation to Restructure Followers Schema
I would recommend that you restructure your user documents if the followers will grow to a large amount. Currently user schema may be as such:
{
_id: ObjectId("123"),
username: "jobs",
email: "stevej#apple.com",
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
The good thing about the schema is whenever this user does anything all of the users you need to notify is right here inside of the document. The downside is that if you needed to find everyone a user is following you will have to query the entire users collection. Also your user document will become larger and more volatile as the followers grow.
You may want to further normalize your followers. You can keep a collection that matches followee to followers with documents that look like this:
{
_id: ObjectId("123"),//Followee's "_id"
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
This will keep your user documents slender, but will take an extra query to get the followers. As the "followers" array changes in size, you can enable the userPowerOf2Sizes allocation strategy to reduce fragmentation and moves.

Apply limit to each condition in an $or query?

I have a chatroom application written in Meteor, using MongoDB. Each chatroom contains many messages, and a user can join multiple chatrooms. I'd like to create a query that fetches the 200 most recent messages from each chatroom for all the chatrooms that a given user is in. I'm doing something like this:
// These are the ids of the chatrooms the user is currently in
var conditions = [{chatroomId: 1}, {chatroomId: 2}];
Messages.find({$or: conditions}, {sort: {createdAt: -1}, limit: 200});
However, naturally, this limit applies to the entire query; so the user might end up with 180 messages from one room and 20 from another. Worse, as new messages are added, it's inconsistent which room has old messages culled away.
I could increase the limit as the user joins more chatrooms, but that could lead to them having 5 messages in two chatrooms and 590 in the third.
Is there a way to apply a slice or limit to each condition in the or? I'm publishing this as a Meteor publication, so I need to return a single cursor.

You can't apply a limit to only a portion of the result set, so the only way to accomplish this is with multiple subscriptions. Here's some sample code using template subscriptions. You'll need to modify it for your specific use case.
server
// Returns a limited set of messages for one specific chat room.
Meteor.publish('messages', function(chatroomId, limit) {
check(chatroomId, String);
check(limit, Number);
var options = {sort: {createdAt: -1}, limit: limit};
return Messages.find({chatroomId: chatroomId}, options);
});
client
Template.myTemplate.helpers({
messages: function() {
// Read all of the messages (you may want to restrict these by room),
// sort them, and return the first 100 across all.
return _.chain(Messages.find().fetch())
.sortBy(function(message){return -message.createdAt.getTime();})
.first(100)
.value();
}
});
Template.myTemplate.onCreated(function() {
// Subscribe to multiple chat rooms - here we are assuming the ids are
// stored in the current context - modify this as needed for your use case.
this.subscribe('messages', this.chatroomId1, 100);
this.subscribe('messages', this.chatroomId2, 100);
});

I believe that simply having a dynamic number of .find()s, one for each room, would address the situation.

How can I implement an ordered array with mongodb without race-conditions?

I'm new to mongodb, maybe this is a trivial question. I have two mongodb collections: user and post. A user can create and follow multiple posts, and posts are listed sorted by last modification date. There may be a very large number of users following a specific post, so I don't want to keep the list of followers in each post document. On the other hand, one user will probably not follow more than a few thousand posts, so I decided to keep the list of followed posts' objectids in each user document.
In order to be able to quickly list the 50 most recently modified posts for a given user, I chose to keep the last_updated_at field along with the post objectid.
The post document is fairly basic:
{
"_id" : ObjectId("5163deebe4d809d55d27e847"),
"title" : "All about music"
"comments": [...]
...
}
The user document looks like this:
{
"_id": ObjectId("5163deebe4d809d55d27e846"),
"posts": [{
"post": ObjectId("5163deebe4d809d55d27e847"),
"last_updated_at": ISODate("2013-04-09T11:27:07.184Z")
}, {
"post": ObjectId("5163deebe4d809d55d27e847"),
"last_updated_at": ISODate("2013-04-09T11:27:07.187Z")
}]
...
}
When a user creates or follows a post, I can simply $push the post's ObjectId and last_updated_at to the end of the posts list in the user's document. When a post is modified (for example when a comment is added to the post), I update the last_updated_at field for that post in all the follower's user documents. That's pretty heavy, but I don't know how to avoid it.
When I want to get the list of 50 most recently updated posts for a user, I unfortunately need to get the whole list of followed posts, then sort by last_updated_at in memory, then keep only the first 50 posts.
So I tried to change the implementation to reorder the list when a post is modified: I $push it to the end of the list, and $pull it from wherever it is. Since this is a two step procedure, there's a race condition where I might get twice the same post in the list. Is there no better way to maintain a sorted array in mongodb?

Data model adjustment
Since you may have frequent updates to the latest posts for a given user, you probably want to avoid the overhead of rewriting data unnecessarily to maintain a sorted array.
A better approach to consider would be to flatten the data model and use a separate collection instead of an ordered array:
create a separate collection with the updated post stream: (userID, postID, lastUpdated)
when a post is updated, you can then do a simple update() with the multi:true and upsert:true options and $set the last_updated_at to the new value.
to retrieve the last 50 updated posts for a given userID you can do a normal find() with sort and limit options.
to automatically clean up the "old" documents you could even set a TTL expiry for this collection so the updates are removed from the activity stream after a certain number of days
Pushing to fixed-size & sorted arrays in MongoDB 2.4
If you do want to maintain ordered arrays, MongoDB 2.4 added two helpful features related to this use case:
Ability to push to fixed-sized arrays
Ability to push to arrays sorted by embedded document fields
So you can achieve your outcome of pushing to a fixed-sized array of 50 items sorted by last updated date descending:
db.user.update(
// Criteria
{ _id: ObjectId("5163deebe4d809d55d27e846") },
// Update
{ $push: {
posts: {
// Push one or more updates onto the posts array
$each: [
{
"post": ObjectId("5163deebe4d809d55d27e847"),
"last_updated_at": ISODate()
}
],
// Slice to max of 50 items
$slice:-50,
// Sorted by last_updated_at desc
$sort: {'last_updated_at': -1}
}
}}
)
The $push will update the list in sorted order, with the $slice trimming the list to the first 50 items. Since the posts aren't unique you'll still need to $pull the original from the list first, eg:
db.user.update(
// Criteria
{ _id: ObjectId("5163deebe4d809d55d27e846") },
// Update
{
$pull: {
posts: { post: ObjectId("5163deebe4d809d55d27e847") }
}
}
)
A benefit of this approach is that array manipulation is being done on the server, but as with sorting the array in your application you may still be updating the document more than is required.

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?

When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.

Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.

For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse