Will aggregation pipeline use index for derived field? - mongodb

In aggregation pipeline I have this:-
{
"$project" => {
account_name_i: { "$toLower" => "$account_name" },
}
}
{
"$sort" => {
account_name_i: 1
}
}
and I have index { account_name: 1 }
My Question is will $sort use index on account_name? If no, there is any other way we can achieve this in aggregation pipeline?

No. The aggregation pipeline can only use a standard index on a $match or $sort phase that's at the beginning of a pipeline. The rules for using indexes with aggregation pipelines are described in detail in the manual.
You aim to sort on the lower-case version of account_name_i, most likely to function as a case-insensitive sort. To achieve this with an index, store a lower-case-normalized version of account_name_i in each document
{ "account_name_i" : "TruMAn's HABerdaShEry", "account_name_i_lc" : "truman's haberdashery" }
and index the normalized field ({ "account_name_i_lc" : 1 }).

If you are using Aggregation Pipeline Stages like $group or $match on a collection then it will
produce a new set of documents from that collection.
as it is new set of documents which means no index is defined for it and hence we cant use the index in Aggregation Pipeline
Note: although if you use $sort as first stage of Aggregation Pipeline then it will utilize the index of
the collection as input for this stage is collection itself.

Kind of, though only with preprocessing. You can do an output after your project stage and query the output collection for sorting:
db.yourSource.aggregate([
{
"$project" => {
account_name_i: { "$toLower" => "$account_name" },
}
},
{
$out: "intermediate"
}
])
db.intermediate.ensureIndex({account_name_i:1})
db.intermediate.find({}).sort({account_name_i:1})

Related

Use MongoDB $sort twice

Is there a way to use the $sort operator twice within a single aggregation pipeline?
I know that using a singular $sort with two keys works properly, i.e. sort by the first key, then the second.
My current project requires multiple $sort stages to exist, for example
db.collection.aggregate([
{
$sort: {
"age": 1
}
},
{
$sort: {
"score": -1
}
}
])
Currently, the second stage doesn't respect the result of the first stage. Is there any workaround for that?
Is it possible to, for example, assign each document a new field 'index' after the first stage, storing its index within the current array of results, and use that field in the second $sort stage?
You can use multiple value in '$sort'.
db.collection.aggregate([
{
"$sort": {
"age": 1,
"score": -1
}
}
])
I have define mongo playground link, you can refer it
https://mongoplayground.net/p/ZaRX_XNSXhu

SELECT avg(rate) FROM ratings WHERE sid=1 in MongoDB

How to implement equivalent of this SQL command in MongoDB?
SELECT avg(rate) FROM ratings WHERE sid=1
No need to grouping.
Yes there is aggregation framework in mongodb where you can make a pipeline of stages you want for query.
db.collection.aggregate([
{
$match: {
"sid": 1
}
},
{
$project: avg(rate): {
$avg: "$rate"
}
}
])
As you know in sql query where part is applied first that's why we've place $match pipeline at first. $match in mongodb is somehow equivalent to where i SQL and there is $avg in mongodb which works the same as AVG in SQL
To solve this, use $avg within the $group aggregation pipeline element. Basic pipeline flow:
match on sid=1 (your WHERE clause)
group by sid (there's only one sid to group by at this point, because the others are filtered out via match), and generate an average within the group'd content
Your pipeline would look something like:
db.rates.aggregate(
[
{ $match: {"sid":1}},
{ $group: { _id: "$sid", rateAvg: {$avg: "$rate" } }}
])

Using $match on computed field Mongodb

I have this aggregation query in MongoDB:
db.questions.aggregate([
{ $project:{question:1,detail:1, choices:1, answer:1,
percent_false:{
$multiply:[100,{$divide:["$answear_false",{$add:["$answear_false","$answear_true"]}]}]},
percent_true:{
$multiply:[100,{$divide:["$answear_true",{$add:["$answear_false","$answear_true"]}]}]} }}, {$match:{status:'active'} }
]).pretty()
I want using $match on 2 computed fields "percent_true" and "percent_false" like this
$match : {percent_true:{$gte:20}}
How can i do ?
Singe the aggregation framework works in stages, you can treat the computed fields as if they were normal fields because from the $match's perspective, they are normal.
{ $project:{
question:1,detail:1, choices:1, answer:1,
percent_false:{
$multiply:[100,{$divide:["$answear_false",{$add:["$answear_false","$answear_true"]}]}]
},
percent_true:{
$multiply:[100,{$divide:["$answear_true",{$add:["$answear_false","$answear_true"]}]}]}
}
},
{$match:{
status:'active',
percent_true:{$gte:20}
//When documents get fed to match they already have a percent_true field, so you can match on them as normal
}
}

MongoDB's Aggregation Framework: project only matching element of an array

I have a "class" document as:
{
className: "AAA",
students: [
{name:"An", age:"13"},
{name:"Hao", age:"13"},
{name:"John", age:"14"},
{name:"Hung", age:"12"}
]
}
And i want to get the student who has name is "An", get only matching element in array "students". I can do that with function find() as:
>db.class.find({"students.name":"An"}, {"students.$":true})
{
"_id" : ObjectId("548b01815a06570735b946c1"),
"students" : [
{
"name" : "An",
"age" : "13"
}
]}
It's fine, but when i do the same with Aggregation as following, it get error:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
Error is:
uncaught exception: aggregate failed: {
"errmsg" : "exception: FieldPath field names may not start with '$'.",
"code" : 16410,
"ok" : 0
}
Why? I can't use "$" for array in $project operator of aggregate() while can use this one in project operator of find().
From the docs:
Use $ in the projection document of the find() method or the findOne()
method when you only need one particular array element in selected
documents.
The positional operator $ cannot be used in an aggregation pipeline projection stage. It is not recognized there.
This makes sense, because, when you execute a projection along with a find query, the input to the projection part of the query is a single document that has matched the query.The context of the match is known even during projection. So for each document that matches the query, the projection operator is applied then and there before the next match is found.
db.class.find({"students.name":"An"}, {"students.$":true})
In case of:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
The aggregation pipeline is a set of stages. Each stage is completely unaware and independent of its previous or next stages. A set of documents pass a stage completely before being passed on to the next stage in the pipeline. The first stage in this case being the $match stage, all the documents are filtered based on the match condition. The input to the projection stage is now a set of documents that have been filtered as part of the match stage.
So a positional operator in the projection stage makes no sense, since in the current stage it doesn't know on what basis the fields had been filtered. Therefore, $ operators are not allowed as part of the field paths.
Why does the below work?
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])
As you see, the projection stage gets a set of documents as input, and projects the required fields. It is independent of its previous and next stages.
Try using the unwind operator in the pipeline: http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/#pipe._S_unwind
Your aggregation would look like
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])
You can use $filter to selects a subset of an array to return based on the specified condition.
db.class.aggregate([
{
$match:{
"className": "AAA"
}
},
{
$project: {
$filter: {
input: "$students",
as: "stu",
cond: { $eq: [ "$$stu.name", "An" ] }
}
}
])
The following example filters the Students array to only include documents that have a name equal to "An".

Get first element in array and return using Aggregate?

How can I get and return the first element in an array using a Mongo aggregation?
I tried using this code:
db.my_collection.aggregate([
{ $project: {
resp : { my_field: { $slice: 1 } }
}}
])
but I get the following error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: invalid operator '$slice'",
"code" : 15999,
"ok" : 0
}
Note that 'my_field' is an array of 4 elements, and I only need to return the first element.
Since 3.2, we can use $arrayElemAt to get the first element in an array
db.my_collection.aggregate([
{ $project: {
resp : { $arrayElemAt: ['$my_field',0] }
}}
])
Currently, the $slice operator is unavailable in the the $project operation, of the aggregation pipeline.
So what you could do is,
First $unwind, the my_field array, and then group them together and take the $first element of the group.
db.my_collection.aggregate([
{$unwind:"$my_field"},
{$group:{"_id":"$_id","resp":{$first:"$my_field"}}},
{$project:{"_id":0,"resp":1}}
])
Or using the find() command, where you could make use of the $slice operator in the projection part.
db.my_collection.find({},{"my_field":{$slice:1}})
Update: based on your comments, Say you want only the second item in an array, for the record with an id, id.
var field = 2;
var id = ObjectId("...");
Then, the below aggregation command gives you the 2nd item in the my_field array of the record with the _id, id.
db.my_collection.aggregate([
{$match:{"_id":id}},
{$unwind:"$my_field"},
{$skip:field-1},
{$limit:1}
])
The above logic cannot be applied for more a record, since it would involve a $group, operator after $unwind. The $group operator produces a single record for all the records in that particular group making the $limit or $skip operators applied in the later stages to be ineffective.
A small variation on the find() query above would yield you the expected result as well.
db.my_collection.find({},{"my_field":{$slice:[field-1,1]}})
Apart from these, there is always a way to do it in the client side, though a bit costly if the number of records is very large:
var field = 2;
db.my_collection.find().map(function(doc){
return doc.my_field[field-1];
})
Choosing from the above options depends upon your data size and app design.
Starting Mongo 4.4, the aggregation operator $first can be used to access the first element of an array:
// { "my_field": ["A", "B", "C"] }
// { "my_field": ["D"] }
db.my_collection.aggregate([
{ $project: { resp: { $first: "$my_field" } } }
])
// { "resp" : "A" }
// { "resp" : "D" }
The $slice operator is scheduled to be made available in the $project operation in Mongo 3.1.4, according to this ticket: https://jira.mongodb.org/browse/SERVER-6074
This will make the problem go away.
This version is currently only a developer release and is not yet stable (as of July 2015). Expect this around October/November time.
Mongo 3.1.6 onwards,
db.my_collection.aggregate([
{
"$project": {
"newArray" : { "$slice" : [ "$oldarray" , 0, 1 ] }
}
}
])
where 0 is the start index and 1 is the number of elements to slice