MongoDB Sorting: Equivalent Aggregation Query - mongodb

I have following students collection
{ "_id" : ObjectId("5f282eb2c5891296d8824130"), "name" : "Rajib", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824131"), "name" : "Rahul", "mark" : "1200" }
{ "_id" : ObjectId("5f282eb2c5891296d8824132"), "name" : "Manoj", "mark" : "1000" }
{ "_id" : ObjectId("5f282eb2c5891296d8824133"), "name" : "Saroj", "mark" : "1400" }
My requirement is to sort the collection basing on 'mark' field in descending order. But it should not display 'mark' field in final result. Result should come as:
{ "name" : "Saroj" }
{ "name" : "Rahul" }
{ "name" : "Rajib" }
{ "name" : "Manoj" }
Following query I tried and it works fine.
db.students.find({},{"_id":0,"name":1}).sort({"mark":-1})
My MongoDB version is v4.2.8. Now question is what is the equivalent Aggregation Query of the above query. I tried following two queries. But both didn't give me desired result.
db.students.aggregate([{"$project":{"name":1,"_id":0}},{"$sort":{"mark":-1}}])
db.students.aggregate([{"$project":{"name":1,"_id":0,"mark":1}},{"$sort":{"mark":-1}}])

Why it is working in find()?
As per Cursor.Sort, When a set of results are both sorted and projected, the MongoDB query engine will always apply the sorting first.
Why it isn't working in aggregate()?
As per Aggregation Pipeline, The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may generate new documents or filter out documents.
You need to correct:
You should change pipeline order, because if you have not selected mark field in $project then it will no longer available in further pipelines and it will not affect $sort operation.
db.students.aggregate([
{ "$sort": { "mark": -1 } },
{ "$project": { "name": 1, "_id": 0 } }
])
Playground: https://mongoplayground.net/p/xtgGl8AReeH

Related

Count recurrences in a collection and merge aggregation result to another collection

I need to count the recurrences of a value in the collection A, so I do
db.collectionA.aggregate( [ { $group : { name : "$name", count :{$sum: 1 } } } ] )
And have something like
{
"name": "Bruce",
"count": 2
},
{
"_id": "Alfred",
"count": 3
}
Then I need to get this result and populate a field of the collection B, I imagine something like a forEach but don't know how to implement
db.collectionB.findAndModify({query: {"name":forEach of the pervious result},
update:{"nameRecurrences": value of the count}})
Looking at your aggregation pipeline :
db.collectionA.aggregate( [
{ $group : { _id : '$name', name : {$first : "$name"}, nameRecurrences :{$sum: 1 } } }, // renamed field `nameRecurrences` to match with field name in `collection-B`
{$project : {_id : 0}} ] ) // Removing _id to avoid conflicts on merge
On MongoDB version >= 4.2 you can use $merge aggregation operator to merge result of aggregation pipeline on one collection to another collection :
Just add below stage as last stage of aggregation pipeline :
{$merge : { into: { db: "dbName", coll: "collectionB" }, on: "name", whenNotMatched: "discard"}} // Remember to create unique index on `name` field on `collectionB`
Since you're using MongoDB version 3.6.16 :
If the collection has to be created now then you can use $out, but if it's an existing collection with lot of fields in each document apart from just name & nameRecurrences then you can try this in code :
Since you've different filters and their respective update part, then you can take advantage of .bulkWrite() to update multiple documents :
let bulkArr = []
for (const i of aggregationResult){
bulkArr.push( { updateOne : {
"filter" : { "name" : i.name },
"update" : { $set : { "nameRecurrences" : i.nameRecurrences } }
} })
}
db.collectionB.bulkWrite(bulkArr)

$Avg aggregation in Mongodb [duplicate]

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

How to find a document with maximum field value in mongodb?

I have a number of Mongodb documents of the following form:
{
"auditedId" : "53d0f648e4b064e8d746b31c",
"modifications" : [
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31d"),
"modified" : "2014-07-22 18:33:05"
},
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
},
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31f"),
"modified" : "2014-07-24 12:04:24"
}
]
}
For each of these documents I want to find "auditRecordId" value which corresponds to the latest modification. In the given example I want to retrieve
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e")
Or, even better:
{
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
}
Is there any way how I can do this without writing map-reduce functions?
Whenever you have an array in your document, the aggregate method is your friend :)
db.foo.aggregate([
// De-normalize the 'modifications' array
{"$unwind":"$modifications"},
// Sort by 'modifications.modified' descending
{"$sort":{"modifications.modified":-1}},
// Pick the first one i.e., the max
{"$limit":1}
])
Output:
{
"result" : [
{
"_id" : ObjectId("53d12be57a462c7459b6f1c7"),
"auditedId" : "53d0f648e4b064e8d746b31c",
"modifications" : {
"auditRecordId" : ObjectId("53d0f648e4b064e8d746b31e"),
"modified" : "2014-07-24 14:15:27"
}
}
],
"ok" : 1
}
Just to illustrate the $unwind operator, I used the above query with $limit. If you have multiple documents of the above format, and you want to retrieve the latest modification in each, you'll have to add another $group phase in your aggregation pipeline and use the $first operator:
db.foo.aggregate([
{"$unwind":"$modifications"},
{"$sort":{"modifications.modified":-1}},
{"$group":{
"_id" : "$auditedId",
"modifications" : {$first:"$modifications"}}}
])

Average a Sub Document Field Across Documents in Mongo

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

Save Subset of MongoDB Collection to Another Collection

I have a set like so
{date: 20120101}
{date: 20120103}
{date: 20120104}
{date: 20120005}
{date: 20120105}
How do I save a subset of those documents with the date '20120105' to another collection?
i.e db.subset.save(db.full_set.find({date: "20120105"}));
I would advise using the aggregation framework:
db.full_set.aggregate([ { $match: { date: "20120105" } }, { $out: "subset" } ])
It works about 100 times faster than forEach at least in my case. This is because the entire aggregation pipeline runs in the mongod process, whereas a solution based on find() and insert() has to send all of the documents from the server to the client and then back. This has a performance penalty, even if the server and client are on the same machine.
Here's the shell version:
db.full_set.find({date:"20120105"}).forEach(function(doc){
db.subset.insert(doc);
});
Note: As of MongoDB 2.6, the aggregation framework makes it possible to do this faster; see melan's answer for details.
Actually, there is an equivalent of SQL's insert into ... select from in MongoDB. First, you convert multiple documents into an array of documents; then you insert the array into the target collection
db.subset.insert(db.full_set.find({date:"20120105"}).toArray())
The most general solution is this:
Make use of the aggregation (answer given by #melan):
db.full_set.aggregate({$match:{your query here...}},{$out:"sample"})
db.sample.copyTo("subset")
This works even when there are documents in "subset" before the operation and you want to preserve those "old" documents and just insert a new subset into it.
Care must be taken, because the copyTo() command replaces the documents with the same _id.
There's no direct equivalent of SQL's insert into ... select from ....
You have to take care of it yourself. Fetch documents of interest and save them to another collection.
You can do it in the shell, but I'd use a small external script in Ruby. Something like this:
require 'mongo'
db = Mongo::Connection.new.db('mydb')
source = db.collection('source_collection')
target = db.collection('target_collection')
source.find(date: "20120105").each do |doc|
target.insert doc
end
Mongodb has aggregate along with $out operator which allow to save subset into new collection. Following are the details :
$out Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
The $out operation creates a new collection in the current database if one does not already exist.
The collection is not visible until the aggregation completes.
If the aggregation fails, MongoDB does not create the collection.
Syntax :
{ $out: "<output-collection>" }
Example
A collection books contains the following documents:
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
The following aggregation operation pivots the data in the books collection to have titles grouped by authors and then writes the results to the authors collection.
db.books.aggregate( [
{ $group : { _id : "$author", books: { $push: "$title" } } },
{ $out : "authors" }
] )
After the operation, the authors collection contains the following documents:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
In the asked question, use following query and you will get new collection named 'col_20120105' in your database
db.products.aggregate([
{ $match : { date : "20120105" } },
{ $out : "col_20120105" }
]);
You can also use $merge aggregation pipeline stage.
db.full_set.aggregate([
{$match: {...}},
{ $merge: {
into: { db: 'your_db', coll: 'your_another_collection' },
on: '_id',
whenMatched: 'keepExisting',
whenNotMatched: 'insert'
}}
])