Is there a way to limit the number of records in certain collection - mongodb

Assuming I insert the following records (e.g. foo1, foo2, foo3, foo4, .... foo10)
I would like the collection to retain only 5 records at any point in time (e.g. it could be foo1, ... foo5 OR foo2, ... foo6 or foo6, ... foo10)
How should I achieve this?

Sounds like you're looking for a capped collection:
Capped collections are fixed sized collections...
[...]
Once the space is fully utilized, newly added objects will replace the oldest objects in the collection.
You can achieve this using a command similar to
db.createCollection("collectionName",{capped:true,size:10000,max:5})
where 10000 is the size in bytes and 5 is the maximum number of documents you would restrict in the collection.

By using
db.createCollection("collectionName",{capped:true,size:10000,max:1000})
command, limits can be put on the no. of records in collection

Capped Collections sound great but it limits remove operation.
In a capped Collection you can only insert new elements and old elements will be removed automatically when certain size is reached.
However if you want to insert and delete any document in any order and limit the maximum number of document in a collection, mongoDB does not offer a direct solution.
When I encounter this problem I use an extra count variable in another collection.
This collection has a validation rule that avoids count variables to become negative.
Count variable should always be non negative integer number.
"count": {
"$gte": 0
}
The algorithm was simple.
Decrement the count by one.
If it succeed insert the document.
If it fails it means there is no space left.
Vice versa for deletion.
Also you can use transactions to prevent failures(Count is decremented but service is failed just before insertion operation).

Related

How does MongoDB order their docs in one collection? [duplicate]

This question already has answers here:
How does MongoDB sort records when no sort order is specified?
(2 answers)
Closed 7 years ago.
In my User collection, MongoDB usually orders each new doc in the same order I create them: the last one created is the last one in the collection. But I have detected another collection where the last one I created has the 6 position between 27 docs.
Why is that?
Which order follows each doc in MongoDB collection?
It's called natural order:
natural order
The order in which the database refers to documents on disk. This is the default sort order. See $natural and Return in Natural Order.
This confirms that in general you get them in the same order you inserted, but that's not guaranteed–as you noticed.
Return in Natural Order
The $natural parameter returns items according to their natural order within the database. This ordering is an internal implementation feature, and you should not rely on any particular structure within it.
Index Use
Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index.
MMAPv1
Typically, the natural order reflects insertion order with the following exception for the MMAPv1 storage engine. For the MMAPv1 storage engine, the natural order does not reflect insertion order if the documents relocate because of document growth or remove operations free up space which are then taken up by newly inserted documents.
Obviously, like the docs mentioned, you should not rely on this default order (This ordering is an internal implementation feature, and you should not rely on any particular structure within it.).
If you need to sort the things, use the sort solutions.
Basically, the following two calls should return documents in the same order (since the default order is $natural):
db.mycollection.find().sort({ "$natural": 1 })
db.mycollection.find()
If you want to sort by another field (e.g. name) you can do that:
db.mycollection.find().sort({ "name": 1 })
For performance reasons, MongoDB never splits a document on the hard drive.
When you start with an empty collection and start inserting document after document into it, mongoDB will place them consecutively on the disk.
But what happens when you update a document and it now takes more space and doesn't fit into its old position anymore without overlapping the next? In that case MongoDB will delete it and re-append it as a new one at the end of the collection file.
Your collection file now has a hole of unused space. This is quite a waste, isn't it? That's why the next document which is inserted and small enough to fit into that hole will be inserted in that hole. That's likely what happened in the case of your second collection.
Bottom line: Never rely on documents being returned in insertion order. When you care about the order, always sort your results.
MongoDB does not "order" the documents at all, unless you ask it to.
The basic insertion will create an ObjectId in the _id primary key value unless you tell it to do otherwise. This ObjectId value is a special value with "monotonic" or "ever increasing" properties, which means each value created is guaranteed to be larger than the last.
If you want "sorted" then do an explicit "sort":
db.collection.find().sort({ "_id": 1 })
Or a "natural" sort means in the order stored on disk:
db.collection.find().sort({ "$natural": 1 })
Which is pretty much the standard unless stated otherwise or an "index" is selected by the query criteria that will determine the sort order. But you can use that to "force" that order if query criteria selected an index that sorted otherwise.
MongoDB documents "move" when grown, and therefore the _id order is not always explicitly the same order as documents are retrieved.
I could find out more about it thanks to the link Return in Natural Order provided by Ionică Bizău.
"The $natural parameter returns items according to their natural order within the database.This ordering is an internal implementation feature, and you should not rely on any particular structure within it.
Typically, the natural order reflects insertion order with the following exception for the MMAPv1 storage engine. For the MMAPv1 storage engine, the natural order does not reflect insertion order if the documents relocate because of document growth or remove operations free up space which are then taken up by newly inserted documents."

Get top 50 records for a certain value w/ mongo and meteor

In my meteor project, I have a leaderboard of sorts, where it shows players of every level on a chart, spread across every level in the game. For simplicitys sake, lets say there are levels 1-100. Currently, to avoid overloading meteor, I just tell the server to send me every record newer than two weeks old, but that's not sufficient for an accurate leaderboard.
What I'm trying to do is show 50 records representing each level. So, if there are 100 records at level 1, 85 at level 2, 65 at level 3, and 45 at level 4, I want to show the latest 50 records from each level, making it so I would have [50, 50, 50, 45] records, respectively.
The data looks something like this:
{
snapshotDate: new Date(),
level: 1,
hp: 5000,
str: 100
}
I think this requires some mongodb aggregation, but I couldn't quite figure out how to do this in one query. It would be trivial to do it in two, though - select all records, group by level, sort each level by date, then take the last 50 records from each level. However, I would prefer to do it in one operation, if I could. Is it currently possible to do something like this?
Currently there is no way to pick up n top records of a group, in the aggregation pipeline. There is an unresolved open ticket regarding this: https://jira.mongodb.org/browse/SERVER-9377.
There are two solutions to this:
Keep your document structure as it is now and aggregate, but,
grab the n top records and slice off the remaining records for each group, in the client side.
Code:
var top_records = [];
db.collection.aggregate([
// The sort operation needs to come before the $group,
// because once the records are grouped by level,
// there exists only one document per group.
{$sort:{"snapshotDate":-1}},
// Maintain all the records in an array in sorted order.
{$group:{"_id":"$level","recs":{$push:"$$ROOT"}}},
],{allowDiskUse: true}).forEach(function(level){
level.recs.splice(50); //Keep only the top 50 records.
top_records.push(level);
})
Remember that this loads all the documents for each level and removes the unwanted records in the client side.
Alter your document structure to accomplish what you really need. If
you only need the top n records always, keep them always in sorted
order in the root document.This is accomplished using a sorted capped array.
Your document would look like this:
{
level:1,
records:[{snapshotDate:2,hp:5000,str:100},
{snapshotDate:1,hp:5001,str:101}]
}
where, records is an capped array of size n and always has sub documents sorted in descending order of their snapshotDate.
To make the records array work that way, we always perform an update operation when we need to insert documents to it for any level.
db.collection.update({"level":1},
{$push:{
recs:{
$each:[{snapshotDate:1,hp:5000,str:100},
{snapshotDate:2,hp:5001,str:101}],
$sort:{"snapshotDate":-1},
$slice:50 //Always trim the array size to 50.
}
}},{upsert:true})
What this does is, is always keeps the size of the records array to 50 and always sorts the records whenever new sub documents are inserted at a level.
A simple find, db.collection.find({"level":{$in:[1,2,..]}}), would give you the top 50 records in order, for each selected level.

Possible to retrieve multiple random, non-sequential documents from MongoDB?

I'd like to retrieve a random set of documents from a MongoDB database. So far after lots of Googling, I've only seen ways to retrieve one random document OR a set of documents starting at a random skip position but where the documents are still sequential.
I've tried mongoose-simple-random, and unfortunately it doesn't retrieve a "true" random set. What it does is skip to a random position and then retrieve n documents from that position.
Instead, I'd like to retrieve a random set like MySQL does using one query (or a minimal amount of queries), and I need this list to be random every time. I need this to be efficient -- relatively on par with such a query with MySQL. I want to reproduce the following but in MongoDB:
SELECT * FROM products ORDER BY rand() LIMIT 50;
Is this possible? I'm using Mongoose, but an example with any adapter -- or even a straight MongoDB query -- is cool.
I've seen one method of adding a field to each document, generating a random value for each field, and using {rand: {$gte:rand()}} each query we want randomized. But, my concern is that two queries could theoretically return the same set.
You may do two requests, but in an efficient way :
Your first request just gets the list of all "_id" of document of your collections. Be sure to use a mongo projection db.products.find({}, { '_id' : 1 }).
You have a list of "_id", just pick N randomly from the list.
Do a second query using the $in operator.
What is especially important is that your first query is fully supported by an index (because it's "_id"). This index is likely fully in memory (else you'd probably have performance problems). So, only the index is read while running the first query, and it's incredibly fast.
Although the second query means reading actual documents, the index will help a lot.
If you can do things this way, you should try.
I don't think MySQL ORDER BY rand() is particularly efficient - as I understand it, it essentially assigns a random number to each row, then sorts the table on this random number column and returns the top N results.
If you're willing to accept some overhead on your inserts to the collection, you can reduce the problem to generating N random integers in a range. Add a counter field to each document: each document will be assigned a unique positive integer, sequentially. It doesn't matter what document gets what number, as long as the assignment is unique and the numbers are sequential, and you either don't delete documents or you complicate the counter document scheme to handle holes. You can do this by making your inserts two-step. In a separate counter collection, keep a document with the first number that hasn't been used for the counter. When an insert occurs, first findAndModify the counter document to retrieve the next counter value and increment the counter value atomically. Then insert the new document with the counter value. To find N random values, find the max counter value, then generate N distinct random numbers in the range defined by the max counter, then use $in to retrieve the documents. Most languages should have random libraries that will handle generating the N random integers in a range.

Remove given number of records in Mongodb

I have Too much records in my Collection, can I have only desired number of records and remove others without any condition?
I have a collection called Products with around 10,0000 of records and its slowing down my Local application, I am thinking to shrink this huge amount of records to something around 1000, How can do it?
OR
How to copy a collection with limited number of records?
If you want to copy collection with limited number of records without any filter condition, for loop can be used . It copies 1000 document form originalCollection to copyCollection.
db.originalCollection.find().limit(1000).forEach( function(doc)
{db.copyCollection.insert(doc)} );

Mongodb store and select order

Basic question. Does mongodb find command will always return documents in the order they where added to collection? If no how is it possible to implement selection docs in the right order?
Sort? But what if docs where added simultaneously and say created date is the same, but there was an order still.
Well, yes and ... not exactly.
Documents are default sorted by natural order. Which is initially the order the documents are stored on disk, which is indeed the order in which the documents had been added to a collection.
This order however, is not deterministic, as document may be moved on disk once these documents grow after update operations, and can't be fit into current space anymore. This way the initial (insert) order may change.
The way to guarantee insert order sort is sort by {_id : 1} as long as the _id is of type ObjectId. This will return your documents sorted in ascending order.
Write operations do not take place simultaneously. Write locks are imposed in database level (V 2.4 and on). The first four bytes of _id is insert timestamp, and 3 last digits is a random counter used to distinguish (and sort) between ObjectId instances with same timestamp.
_id field is indexed by default