I want to build a query for a very dynamic collection.
An example:
I have a collection like
_id: ObjectId(),
value: x
// some other data
The example dataset has the values
value: 1
value: 1
value: 2
value: 3
value: 3
As you can see the same value can be there multiple times.
But if I run the following query it only returns the first with value: 3
$sort: "$value"
$limit: 4
But what I want is at least 4 documents which include all occurrences of the values in them. So I want all where value: 3.
Sorry, the question might be a bit misleading. I want to have a complete result. So all with value: 3. It is for a public transport database and the value is the departure time. So I want at least the next 30 departures, but if 30 and 31 depart at the same time, I want the 31 also.

I now use a small python function which extends the limit as I want. Since the query returns a cursor I do not waste resources. I do not specify a limit in the query.
def extend_limit(cursor, original_limit):
result = []
while original_limit > 0:
original_limit -= 1
last_element = result[-1]
while True:
next_element = next(cursor)
if last_element['value'] != next_element['value']:
except StopIteration:
return result
Thanks to Adam Comerford

There is no need to use aggregation here, just do a normal find with a projection, a sort and a limit:
db.collection.find({}, {_id : 0, value : 1}).sort({value : 1}).limit(4)
I'd recommend that you actually query on some criteria (rather than empty in my example) and that the criteria have an appropriate index that includes the sorted field if possible (for performance reasons).


After adding TTL to mongodb i cannot insert more than once golang

So i have created a ttl using an IndexModel but when i insert something into a collection i cannot isnert again and it returns this as an error: (0xd3cc80,0xc0004385b0).
Changing the value of "exp" makes it possible to insert once more but after that it doesn't work anymore.
indexOptions := options.Index().SetExpireAfterSeconds(int32(url.Expires))
indexName, err := c.Coll.Indexes().CreateOne(
mongo.IndexModel{Keys: bson.M{"expires": 1}, Options: indexOptions},
if err != nil {
return nil, err
_, err = c.Coll.InsertOne(c.Ctx, url)
return url, err
Note that url.Expires is a Unix timestamp in the future (around +1 days from now)
Side question: When is TTL supposed to delete stuff i ran a test where i set it to 20 seconds and it still hasn't deleted it yet.
Edit 1 | Indexes:
{ v: 2, key: { _id: 1 }, name: '_id_' },
{ v: 2, key: { exp: 1 }, name: 'exp_1', expireAfterSeconds: 20 },
v: 2,
key: { expires: 1 },
name: 'expires_1',
expireAfterSeconds: 1672819674
v: 2,
key: { expires: -1 },
name: 'expires_-1',
expireAfterSeconds: 1672906158
v: 2,
key: { expires: 3000 },
name: 'expires_3000',
expireAfterSeconds: 1672906336
v: 2,
key: { created_at: 1 },
name: 'created_at_1',
expireAfterSeconds: 1672835415
These are the indexes, the issue seems to be the fact that the keys value is always a set value e.g.: 1, -1 etc... How would i solve this is there an automatic way to increment the number for each inserted element?
Edit 2:
the issue is that it is the same value by chaging it to take either a random number or something unique then it works but this feels way too hacky is there any better way of doing this
As far as I can tell, the problem has nothing to do with uniqueness of the values in the documents. Indeed I don't believe the problem has anything to do with inserting data at all. Rather the issue appears to be that you are creating duplicate indexes.
The code you shared shows two things:
The function first attempts to create a new index using an expiry duration provided by one of the fields in the url object.
It then attempts to insert a document into the collection using the same url object.
There is almost no situation in which this block of code should be creating the index at all, let alone doing so while using a duration value provided in the same object that is being inserted into the database. Index creation is a maintenance type of task that is typically done infrequently on a collection, it is intended to be done via the same code that writes to the database. Moreover, the TTL index is intended to set the consistent expiry time behavior for all documents in the collection. It is the value of the corresponding field in the documents that controls their specific expiry.
I would suggest the following:
Remove all code related to index creation from this function.
Select the single index from the following four "duplicates" that correctly defines the behavior that you are looking to have and remove the other three. If none of them do, then remove all four and create the single new index that you desire.
{ v: 2, key: { exp: 1 }, name: 'exp_1', expireAfterSeconds: 20 },
v: 2,
key: { expires: 1 },
name: 'expires_1',
expireAfterSeconds: 1672819674
v: 2,
key: { expires: -1 },
name: 'expires_-1',
expireAfterSeconds: 1672906158
v: 2,
key: { expires: 3000 },
name: 'expires_3000',
expireAfterSeconds: 1672906336
Double check that you want the final index listed in your question which is a separate TTL on created_at. If not, remove that one as well.
These are the indexes, the issue seems to be the fact that the keys value is always a set value e.g.: 1, -1 etc... How would i solve this is there an automatic way to increment the number for each inserted element?
You are conflating two ideas here. The definition of the index is not the same thing as the value of the field in the documents. You will have one TTL index for this field for the collection and it will have a static definition. You will not be incrementing anything here. The 1 (or -1) value in the index definition represents the ordering (ascending or descending). It has nothing to do with the value of the field in the documents, and it should really never be anything other than 1 or -1 for standard indexes.
Open Questions
We could do a much better job of answering this question if there were more and consistent troubleshooting information provided. Some helpful clarifications would include:
What version is the database?
What version is the Go driver?
In the field that you are interesting named exp or expires? You reference both field names when describing your situation and also have indexes with both.
What is the stringified version of the error message and what line is generating it? The hex values you referenced are unhelpful for our purposes.

Show Recent chat message in Mongodb [duplicate]

I can't find anywhere it has been documented this. By default, the find() operation will get the records from beginning. How can I get the last N records in mongodb?
Edit: also I want the returned result ordered from less recent to most recent, not the reverse.
If I understand your question, you need to sort in ascending order.
Assuming you have some id or date field called "x" you would do ...
The 1 will sort ascending (oldest to newest) and -1 will sort descending (newest to oldest.)
If you use the auto created _id field it has a date embedded in it ... so you can use that to order by ...
That will return back all your documents sorted from oldest to newest.
Natural Order
You can also use a Natural Order mentioned above ...
Again, using 1 or -1 depending on the order you want.
Use .limit()
Lastly, it's good practice to add a limit when doing this sort of wide open query so you could do either ...
The last N added records, from less recent to most recent, can be seen with this query:
db.collection.find().skip(db.collection.count() - N)
If you want them in the reverse order:
db.collection.find().sort({ $natural: -1 }).limit(N)
If you install Mongo-Hacker you can also use:
If you get tired of writing these commands all the time you can create custom functions in your ~/.mongorc.js. E.g.
function last(N) {
return db.collection.find().skip(db.collection.count() - N);
then from a mongo shell just type last(N)
Sorting, skipping and so on can be pretty slow depending on the size of your collection.
A better performance would be achieved if you have your collection indexed by some criteria; and then you could use min() cursor:
First, index your collection with db.collectionName.setIndex( yourIndex )
You can use ascending or descending order, which is cool, because you want always the "N last items"... so if you index by descending order it is the same as getting the "first N items".
Then you find the first item of your collection and use its index field values as the min criteria in a search like:
Here's the reference for min() cursor: https://docs.mongodb.com/manual/reference/method/cursor.min/
In order to get last N records you can execute below query:
db.yourcollectionname.find({$query: {}, $orderby: {$natural : -1}}).limit(yournumber)
if you want only one last record:
db.yourcollectionname.findOne({$query: {}, $orderby: {$natural : -1}})
Note: In place of $natural you can use one of the columns from your collection.
db.collection.find().sort({$natural: -1 }).limit(5)
You can use an aggregation for the latest n entries of a subset of documents in a collection. Here's a simplified example without grouping (which you would be doing between stages 4 and 5 in this case).
This returns the latest 20 entries (based on a field called "timestamp"), sorted ascending. It then projects each documents _id, timestamp and whatever_field_you_want_to_show into the results.
var pipeline = [
"$match": { //stage 1: filter out a subset
"first_field": "needs to have this value",
"second_field": "needs to be this"
"$sort": { //stage 2: sort the remainder last-first
"timestamp": -1
"$limit": 20 //stage 3: keep only 20 of the descending order subset
"$sort": {
"rt": 1 //stage 4: sort back to ascending order
"$project": { //stage 5: add any fields you want to show in your results
"_id": 1,
"timestamp" : 1,
"whatever_field_you_want_to_show": 1
yourcollection.aggregate(pipeline, function resultCallBack(err, result) {
// account for (err)
// do something with (result)
so, result would look something like:
"_id" : ObjectId("5ac5b878a1deg18asdafb060"),
"timestamp" : "2018-04-05T05:47:37.045Z",
"whatever_field_you_want_to_show" : -3.46000003814697
"_id" : ObjectId("5ac5b878a1de1adsweafb05f"),
"timestamp" : "2018-04-05T05:47:38.187Z",
"whatever_field_you_want_to_show" : -4.13000011444092
Hope this helps.
You can try this method:
Get the total number of records in the collection with
Then use skip:
db.dbcollection.find().skip(db.dbcollection.count() - 1).pretty()
You can't "skip" based on the size of the collection, because it will not take the query conditions into account.
The correct solution is to sort from the desired end-point, limit the size of the result set, then adjust the order of the results if necessary.
Here is an example, based on real-world code.
var query = collection.find( { conditions } ).sort({$natural : -1}).limit(N);
query.exec(function(err, results) {
if (err) {
else if (results.length == 0) {
else {
results.reverse(); // put the results into the desired order
results.forEach(function(result) {
// do something with each result
you can use sort() , limit() ,skip() to get last N record start from any skipped value
db.collections.find().sort(key:value).limit(int value).skip(some int value);
Look under Querying: Sorting and Natural Order, http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
as well as sort() under Cursor Methods
You may want to be using the find options :
db.collection.find({}, {sort: {createdAt: -1}, skip:2, limit: 18}).fetch();
Use .sort() and .limit() for that
Use Sort in ascending or descending order and then use limit
db.collection.find({}).sort({ any_field: -1 }).limit(last_n_records);
If you use MongoDB compass, you can use sort filed to filter,
use $slice operator to limit array elements
GeoLocation.find({},{name: 1, geolocation:{$slice: -5}})
.then((result) => {
.catch((err) => {
res.status(500).json({ success: false, msg: `Something went wrong. ${err}` });
where geolocation is array of data, from that we get last 5 record.
db.collection.find().hint( { $natural : -1 } ).sort(field: 1/-1).limit(n)
according to mongoDB Documentation:
You can specify { $natural : 1 } to force the query to perform a forwards collection scan.
You can also specify { $natural : -1 } to force the query to perform a reverse collection scan.
Last function should be sort, not limit.

documents with tags in mongodb: getting tag counts

I have a collection1 of documents with tags in MongoDB. The tags are an embedded array of strings:
name: 'someObj',
tags: ['tag1', 'tag2', ...]
I want to know the count of each tag in the collection. Therefore I have another collection2 with tag counts:
tag: 'tag1',
score: 2
tag: 'tag2',
score: 10
Now I have to keep both in sync. It is rather trivial when inserting to or removing from collection1. However when I update collection1 I do the following:
1.) get the old document
var oldObj = collection1.find({ _id: id });
2.) calculate the difference between old and new tag arrays
var removedTags = $(oldObj.tags).not(obj.tags).get();
var insertedTags = $(obj.tags).not(oldObj.tags).get();
3.) update the old document
{ _id: id },
{ $set: obj }
4.) update the scores of inserted & removed tags
// increment score of each inserted tag
insertedTags.forEach(function(val, idx) {
// $inc will set score = 1 on insert
{ tag: val },
{ $inc: { score: 1 } },
{ upsert: true }
// decrement score of each removed tag
removedTags.forEach(function(val, idx) {
// $inc will set score = -1 on insert
{ tag: val },
{ $inc: { score: -1 } },
{ upsert: true }
My questions:
A) Is this approach of keeping book of scores separately efficient? Or is there a more efficient one-time query to get the scores from collection1?
B) Even if keeping book separately is the better choice: can that be done in less steps, e.g. letting mongoDB calculate what tags are new / removed?
The solution, as nickmilion correctly states, would be an aggregation. Though I would do it with a nack: we'll save its results in a collection. What will do is to trade real time results for an extreme speed boost.
How I would do it
More often than not, the need for real time results is overestimated. Hence, I'd go with precalculated stats for the tags and renew it every 5 minutes or so. That should be well enough, since most of such calls are requested async by the client and hence some delay in case the calculation has to be made on a specific request is negligible.
{$group: { _id:"$tags", score:{"$sum":1} } },
{'lastRun':new Date()},
db.tagStats.ensureIndex({lastRun:1}, {sparse:true})
Ok, here is the deal. First, we unwind the tags array, group it by the individual tags and increment the score for each occurrence of the respective tag. Next, we upsert lastRun in the tagStats collection, which we can do since MongoDB is schemaless. Next, we create a sparse index, which only holds values for documents in which the indexed field exists. In case the index already exists, ensureIndex is an extremely cheap query; however, since we are going to use that query in our code, we don't need to create the index manually. With this procedure, the following query
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
becomes a covered query: A query which is answered from the index, which tends to reside in RAM, making this query lightning fast (slightly less than 0.5 msecs median in my tests). So what does this query do? It will return a record when the last run of the aggregation was run more than 5 minutes ( 5*60*1000 = 300000 msecs) ago. Of course, you can adjust this to your needs.
Now, we can wrap it up:
var hasToRun = db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
{$group: {_id:"$tags", score:{"$sum":1} } },
{'lastRun':new Date()},
// For all stats
var tagsStats = db.tagStats.find({score:{$exists:true}});
// score for a specific tag
var scoreForTag = db.tagStats.find({score:{$exists:true},_id:"tag1"});
Alternative approach
If real time results really matter and you need the stats for all the tags, simply use the aggregation without saving it to another collection:
{$group: { _id:"$tags", score:{"$sum":1} } },
If you only need the results for one specific tag at a time, a real time approach could be to use a special index, create a covered query and simply count the results:
var numberOfOccurences = db.tags.find({tags:"tag1"},{_id:0,tags:1}).count();
answering your questions:
B): you don't have to calculate the dif yourself use $addToSet
A): you can get the counts via aggregation framework with a combination of $unwind and $count

Find largest document size in MongoDB

Is it possible to find the largest document size in MongoDB?
db.collection.stats() shows average size, which is not really representative because in my case sizes can differ considerably.
You can use a small shell script to get this value.
Note: this will perform a full table scan, which will be slow on large collections.
let max = 0, id = null;
db.test.find().forEach(doc => {
const size = Object.bsonsize(doc);
if(size > max) {
max = size;
id = doc._id;
print(id, max);
Note: this will attempt to store the whole result set in memory (from .toArray) . Careful on big data sets. Do not use in production! Abishek's answer has the advantage of working over a cursor instead of across an in memory array.
If you also want the _id, try this. Given a collection called "requests" :
// Creates a sorted list, then takes the max
db.requests.find().toArray().map(function(request) { return {size:Object.bsonsize(request), _id:request._id}; }).sort(function(a, b) { return a.size-b.size; }).pop();
// { "size" : 3333, "_id" : "someUniqueIdHere" }
Starting Mongo 4.4, the new aggregation operator $bsonSize returns the size in bytes of a given document when encoded as BSON.
Thus, in order to find the bson size of the document whose size is the biggest:
// { "_id" : ObjectId("5e6abb2893c609b43d95a985"), "a" : 1, "b" : "hello" }
// { "_id" : ObjectId("5e6abb2893c609b43d95a986"), "c" : 1000, "a" : "world" }
// { "_id" : ObjectId("5e6abb2893c609b43d95a987"), "d" : 2 }
{ $group: {
_id: null,
max: { $max: { $bsonSize: "$$ROOT" } }
// { "_id" : null, "max" : 46 }
$groups all items together
$projects the $max of documents' $bsonSize
$$ROOT represents the current document for which we get the bsonsize
Finding the largest documents in a MongoDB collection can be ~100x faster than the other answers using the aggregation framework and a tiny bit of knowledge about the documents in the collection. Also, you'll get the results in seconds, vs. minutes with the other approaches (forEach, or worse, getting all documents to the client).
You need to know which field(s) in your document might be the largest ones - which you almost always will know. There are only two practical1 MongoDB types that can have variable sizes:
The aggregation framework can calculate the length of each. Note that you won't get the size in bytes for arrays, but the length in elements. However, what matters more typically is which the outlier documents are, not exactly how many bytes they take.
Here's how it's done for arrays. As an example, let's say we have a collections of users in a social network and we suspect the array friends.ids might be very large (in practice you should probably keep a separate field like friendsCount in sync with the array, but for the sake of example, we'll assume that's not available):
{ $match: {
'friends.ids': { $exists: true }
{ $project: {
sizeLargestField: { $size: '$friends.ids' }
{ $sort: {
sizeLargestField: -1
The key is to use the $size aggregation pipeline operator. It only works on arrays though, so what about text fields? We can use the $strLenBytes operator. Let's say we suspect the bio field might also be very large:
{ $match: {
bio: { $exists: true }
{ $project: {
sizeLargestField: { $strLenBytes: '$bio' }
{ $sort: {
sizeLargestField: -1
You can also combine $size and $strLenBytes using $sum to calculate the size of multiple fields. In the vast majority of cases, 20% of the fields will take up 80% of the size (if not 10/90 or even 1/99), and large fields must be either strings or arrays.
1 Technically, the rarely used binData type can also have variable size.
Well.. this is an old question.. but - I thought to share my cent about it
My approach - use Mongo mapReduce function
First - let's get the size for each document
function() { emit(this._id, Object.bsonsize(this)) }, // map the result to be an id / size pair for each document
function(key, val) { return val }, // val = document size value (single value for each document)
query: {}, // query all documents
out: { inline: 1 } // just return result (don't create a new collection for it)
This will return all documents sizes although it worth mentioning that saving it as a collection is a better approach (the result is an array of results inside the result field)
Second - let's get the max size of document by manipulating this query
function() { emit(0, Object.bsonsize(this))}, // mapping a fake id (0) and use the document size as value
function(key, vals) { return Math.max.apply(Math, vals) }, // use Math.max function to get max value from vals (each val = document size)
{ query: {}, out: { inline: 1 } } // same as first example
Which will provide you a single result with value equals to the max document size
In short:
you may want to use the first example and save its output as a collection (change out option to the name of collection you want) and applying further aggregations on it (max size, min size, etc.)
you may want to use a single query (the second option) for getting a single stat (min, max, avg, etc.)
If you're working with a huge collection, loading it all at once into memory will not work, since you'll need more RAM than the size of the entire collection for that to work.
Instead, you can process the entire collection in batches using the following package I created:
All you have to do is provide the MongoDB connection string and collection name. The script will output the top X largest documents when it finishes traversing the entire collection in batches.
Inspired by Elad Nana's package, but usable in a MongoDB console :
function biggest(collection, limit=100, sort_delta=100) {
var documents = [];
cursor = collection.find().readPref("nearest");
while (cursor.hasNext()) {
var doc = cursor.next();
var size = Object.bsonsize(doc);
if (documents.length < limit || size > documents[limit-1].size) {
documents.push({ id: doc._id.toString(), size: size });
if (documents.length > (limit + sort_delta) || !cursor.hasNext()) {
documents.sort(function (first, second) {
return second.size - first.size;
documents = documents.slice(0, limit);
return documents;
}; biggest(db.collection)
Uses cursor
Gives a list of the limit biggest documents, not just the biggest
Sort & cut output list to limit every sort_delta
Use nearest as read preference (you might also want to use rs.slaveOk() on the connection to be able to list collections if you're on a slave node)
As Xavier Guihot already mentioned, a new $bsonSize aggregation operator was introduced in Mongo 4.4, which can give you the size of the object in bytes. In addition to that just wanted to provide my own example and some stats.
Usage example:
// I had an `orders` collection in the following format
"uuid": "64178854-8c0f-4791-9e9f-8d6767849bda",
"status": "new",
"uuid": "5145d7f1-e54c-44d9-8c10-ca3ce6f472d6",
"status": "complete",
// and I've run the following query to get documents' size
$match: { status: "complete" } // pre-filtered only completed orders
$project: {
uuid: 1,
size: { $bsonSize: "$$ROOT" } // added object size
$sort: { size: -1 }
{ allowDiskUse: true } // required as I had huge amount of data
as a result, I received a list of documents by size in descending order.
For the collection of ~3M records and ~70GB size in total, the query above took ~6.5 minutes.

