I'm building an API with a particular endpoint that returns various statistics for the entire database.
For this, I have an aggregation pipeline that takes 1 second to complete.
Instead of running this aggregation for every request, I want to store the results in a collection c as the aggregated data changes rarely and is accessed frequently.
I will also define a few indexes on c as I need to return only documents that mactch some criteria passed to the endpoint.
When the source data is changed, I'd run the aggregation again and replace the contents of collection c.
In MongoDB 3.0, the docs about the out operator of the aggregation pipeline state that:
The $out operation does not change any indexes that existed on the previous collection
I'm confused, does this mean that MongoDB won't update the indexes on c when its contents are replaced?
P.S.: I know that MapReduce might be an alternative; I tried that first, but I did not manage to get the results I wanted; my current approach works and given the approaching deadline I'd like to simply "cache" the aggregated data instead of reimplementing this from scratch.
EDIT
What I'm asking is if the indexes will reflect the new documents after the replacement of the collection or if they will be "stale".
Index will be updated when you execute your Aggregate query.
$out will create a collection when your aggregate query is successful
Mongo will update your collection created by $out when you execute your aggregate again
When the collection is updated, then the indexes associated with the collection is also updated
You can test this by following the below steps
Create a smaller collection say 'books'
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
Have your aggregate with output collection - Step2
db.books.aggregate( [{ $group : { _id : "$author", books: { $push: "$title" } } },{ $out : "authors" }] )
Create Index on books in the new collection authors
db.authors.createIndex({books:1})
Query your author collection
db.authors.find({books:'The Banquet'}).explain()
and look for the winning plan
- Add another record
db.books.insert({ "_id" : 7101, "title" : "Wings of Fire", "author" : "APJ Abdul Kalam", "copies" : 1 })
Execute the aggregate query given on step2
Now do a find for the new book which we added
db.authors.find({books:'Wings of Fire'}).explain()
You can find that the Winning plan is having IXSCAN says that the index is used for this search and so the index is updated by Mongo for the new record.
MongoDB will preserve the existing indexes
Replace Existing Collection
If the collection specified by the $out operation already exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection.
Reference:
http://docs.mongodb.org/manual/reference/operator/aggregation/out/
Related
I am trying to create a new field, and set its value to that of an existing array object that resides in the same document.
I have tried 2 approaches:
db.collection.aggregate( [ { $addFields: { "newField": "$oldField"} } ] )
This works great, but only updates 20 documents, not all documents in the collection.
db.collection.update(
{},
{ $set: {"newField": "$oldField"} },
false,
true
)
This updates all documents in the collection, but sets them all to the string "$oldField", and not the value of the object oldField.
How can I update all documents in my collection, adding a new field and setting its value to that of an existing field, which is an array?
Thank you!
Aggregate doesn't change the database unless you are using a stage like $out or $merge as final stage. Aggregate is returning the data to the client, not to the database.
Updates can be done in 2 ways
the older update operators (they are simple and fast)
using a pipeline and update operators (they are powerful, can use all aggregate operators, but not all aggregate stage operator)
In your case you need to refer to a field so you need a pipeline update.
Its the same as the aggregation, just the pipeline is argument to the update method.
Query
(you can use updateMany instead of multi:true)
update({},
[{"$addFields": {"newField": "$oldField"}}],
{"multi": true})
I have a collection where the _id is a Text field. I need to update this field by adding a prefix to it like xxx#, so if a field value was "abc" it's must be now "xxx#abc"
How can I do that with MongoDB?
As _id field in MongoDB's document is immutable, You can't perform this operation using .update()'s. Since you need to update for all the documents, try to re-write the entire collection using aggregation's $out stage :
db.collection.aggregate([{$addFields : {_id : {$concat : ['xxx#','$_id']}}}, {$out : 'collection'}])
Note : As $out will actually replace existing collection (Or creates new one if a collection doesn't exists with given name), test this query very well i.e; test this aggregation query without $out stage & if everything looks good then apply $out. In other way you can write to a new collection & rename collection names once you think everything is fine. Additionally if you're using MongoDB version >= 4.2 you can take advantage of $merge.
I need documents sorted by creation time (from oldest to newest).
Since ObjectID saves timestamp by default, we can use it to get documents sorted by creation time with CollectionName.find().sort({_id: 1}).
Also, I noticed that regular CollectionName.find() query always returns the documents in same order as CollectionName.find().sort({_id: 1}).
My question is:
Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1}) so I could leave sorting out?
No. Well, not exactly.
A db.collection.find() will give you the documents in the order they appear in the data files most of the times, though this isn't guaranteed.
Result Ordering
Unless you specify the sort() method or use the $near operator, MongoDB does not guarantee the order of query results.
As long as your data files are relatively new and few updates happen, the documents might (and most of the times will) be returned in what appears to be sorted by _id since ObjectId is monotonically increasing.
Later in the lifecycle, old documents may have been moved from their old position (because they increased in size and documents are never partitioned) and new ones are written in the place formerly occupied by another document. In this case, a newer document may be returned in a position between two old documents.
There is nothing wrong with sorting documents by _id, since the index will be used for that, adding only some latency for document retrieval.
However, I would strongly recommend against using the ObjectId for date operations for several reasons:
ObjectIds can not be used for date comparison queries. So you couldn't query for all documents created between date x and date y. To archive that, you'd have to load all documents, extract the date from the ObjectId and compare it – which is extremely inefficient.
If the creation date matters, it should be explicitly addressable in the documents
I see ObjectIds as a choice of last resort for the _id field and tend to use other values (compound on occasions) as _ids, since the field is indexed by default and it is very likely that one can save precious RAM by using a more meaningful value as id.
You could use the following for example which utilizes DBRefs
{
_id: {
creationDate: new ISODate(),
user: {
"$ref" : "creators",
"$id" : "mwmahlberg",
"$db" : "users"
}
}
}
And do a quite cheap sort by using
db.collection.find().sort({_id.creationDate:1})
Is CollectionName.find() guaranteed to return documents in same order as CollectionName.find().sort({_id: 1})
No, it's not! If you didn't specify any order, then a so-called "natural" ordering is used. Meaning that documents will be returned in the order in which they physically appear in data files.
Now, if you only insert documents and never modify them, this natural order will coincide with ascending _id order. Imagine, however, that you update a document in such a way that it grows in size and has to be moved to a free slot inside of a data file (usually this means somewhere at the end of the file). If you were to query documents now, they wouldn't follow any sensible (to an external observer) order.
So, if you care about order, make it explicit.
Source: http://docs.mongodb.org/manual/reference/glossary/#term-natural-order
natural order
The order in which the database refers to documents on disk. This is the default sort order. See $natural and Return in Natural Order.
Testing script (for the confused)
> db.foo.insert({name: 'Joe'})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({name: 'Bob'})
WriteResult({ "nInserted" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe" }
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
> db.foo.update({_id: ObjectId("55814b944e019172b7d358a0")}, {$set: {answer: "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression."}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("55814ba44e019172b7d358a1"), "name" : "Bob" }
{ "_id" : ObjectId("55814b944e019172b7d358a0"), "name" : "Joe", "answer" : "On a sharded collection the $natural operator returns a collection scan sorted in natural order, the order the database inserts and stores documents on disk. Queries that include a sort by $natural order do not use indexes to fulfill the query predicate with the following exception: If the query predicate is an equality condition on the _id field { _id: <value> }, then the query with the sort by $natural order can use the _id index. You cannot specify $natural sort order if the query includes a $text expression." }
I'm sure this is an easy one, but I just wanted to make sure. Is find() with some search and projection criterion same as applying a sort({$natural:1}) on it?
Also, what is the default natural sort order? How is it different from a sort({_id:1}), say?
db.collection.find() has the result as same as db.collection.find().sort({$natural:1})
{"$natural" : 1} forces the find query to do a table scan (default sort), it specifies hard-disk order when used in a sort.
When you are updating your document, mongo could move your document to another place of hard-disk.
for example insert documents as below
{
_id : 0,
},
{
_id : 1,
}
then update:
db.collection.update({ _id : 0} , { $set : { blob : BIG DATA}})
And when you perform the find query you will get
{
"_id" : 1
},
{
"_id" : 0,
"blob" : BIG DATA
}
as you see the order of documents has changed => the default order is not by _id
If you don't specify the sort then mongodb find() will return documents in the order they are stored on disk. Document storage on disk may coincide with insertion order but thats not always going to be true. It is also worth noting that the location of a document on disk may change. For instance in case of update, mongodb may move a document from one place to another if needed.
In case of index - The default order will be the order in which indexes are found if the query uses an index.
The $natural is the order in which documents are found on disk.
It is recommended that you specifiy sort explicitly to be sure of sorting order.
Whenever we do db.Collection.find().sort(), only our output is sorted, not the collection itself,
i.e. If i do db.collection.find() then i see the original collection, not the sorted one.
Is there any way to sort the collection itself insted of just sorting the output?
Exporting the sorted result into entire new collection would also work.
if i have numbered _id field.(like _id:1 , _id_2 , _id:3 and so on)
Also I do not see any reason for doing this (index on the field on which you are going to sort it will help you to get this sort fast), here is a solution for your problem:
You have your test collection this way
{ "_id" : ObjectId("5273f6987c6c502364ddfe94"), "n" : 5 }
{ "_id" : ObjectId("5273f6e57c6c502364ddfe95"), "n" : 14}
{ "_id" : ObjectId("5273f6ee7c6c502364ddfe96"), "n" : -5}
Then the following command will create a sorted collection for you
db.test.find().sort({n : 1}).forEach(function(e){
db.testSorted.insert(e);
})
Completely the same way you can achieve with this (which I assume might perform a faster, but I have not done any testing):
db.testSorted.insert(db.test.find().sort({n : 1}).toArray());
And just to make this answer complete, also I understand that this is an overkill, you can do this with aggregation framework option $out.
Just to highlight: with all this you can solve bigger problem: save into another collection some sort of modification/subset of previous collection.
Documents in a collection are stored in natural order which is affected by document moves (when the document grows larger than the current record space allocated) and deletions (free space can be reused for inserted/moved documents). There is currently (as at MongoDB 2.4) no option to control the order of documents on disk aside from using a capped collection, which is a fixed-size collection that maintains insertion order but is subject to a number of restrictions.
An index is the appropriate way to efficiently return documents in an expected sort order. For more information see: Using Indexes to Sort Query Results in the MongoDB manual.
A related feature is a clustered index, which would store documents on disk to match an index ordering. This is not a current feature of MongoDB, although it has been requested (see SERVER-3294).