MongoDB's Aggregation Framework: project only matching element of an array - mongodb

I have a "class" document as:
{
className: "AAA",
students: [
{name:"An", age:"13"},
{name:"Hao", age:"13"},
{name:"John", age:"14"},
{name:"Hung", age:"12"}
]
}
And i want to get the student who has name is "An", get only matching element in array "students". I can do that with function find() as:
>db.class.find({"students.name":"An"}, {"students.$":true})
{
"_id" : ObjectId("548b01815a06570735b946c1"),
"students" : [
{
"name" : "An",
"age" : "13"
}
]}
It's fine, but when i do the same with Aggregation as following, it get error:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
Error is:
uncaught exception: aggregate failed: {
"errmsg" : "exception: FieldPath field names may not start with '$'.",
"code" : 16410,
"ok" : 0
}
Why? I can't use "$" for array in $project operator of aggregate() while can use this one in project operator of find().

From the docs:
Use $ in the projection document of the find() method or the findOne()
method when you only need one particular array element in selected
documents.
The positional operator $ cannot be used in an aggregation pipeline projection stage. It is not recognized there.
This makes sense, because, when you execute a projection along with a find query, the input to the projection part of the query is a single document that has matched the query.The context of the match is known even during projection. So for each document that matches the query, the projection operator is applied then and there before the next match is found.
db.class.find({"students.name":"An"}, {"students.$":true})
In case of:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
The aggregation pipeline is a set of stages. Each stage is completely unaware and independent of its previous or next stages. A set of documents pass a stage completely before being passed on to the next stage in the pipeline. The first stage in this case being the $match stage, all the documents are filtered based on the match condition. The input to the projection stage is now a set of documents that have been filtered as part of the match stage.
So a positional operator in the projection stage makes no sense, since in the current stage it doesn't know on what basis the fields had been filtered. Therefore, $ operators are not allowed as part of the field paths.
Why does the below work?
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])
As you see, the projection stage gets a set of documents as input, and projects the required fields. It is independent of its previous and next stages.

Try using the unwind operator in the pipeline: http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/#pipe._S_unwind
Your aggregation would look like
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])

You can use $filter to selects a subset of an array to return based on the specified condition.
db.class.aggregate([
{
$match:{
"className": "AAA"
}
},
{
$project: {
$filter: {
input: "$students",
as: "stu",
cond: { $eq: [ "$$stu.name", "An" ] }
}
}
])
The following example filters the Students array to only include documents that have a name equal to "An".

Related

Using cond to specify _id fields for group in mongodb aggregation

new to Mongo. Trying to group across different sub fields of a document based on a condition. The condition is a regex on a field value. Looks like -
db.collection.aggregate([{
{
"$group": {
"$cond": [{
"upper.leaf": {
$not: {
$regex: /flower/
}
}
},
{
"_id": {
"leaf": "$upper.leaf",
"stem": "$upper.stem"
}
},
{
"_id": {
"stem": "$upper.stem",
"petal": "$upper.petal"
}
}
]
}
}])
Using api v4.0: cond in the docs shows - { $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
The error I get with the above code is - "Syntax error: dotted field name 'upper.leaf' can not used in a sub object."
Reading up on that I tried $let to re-assign the dotted field name. But started to hit various syntax errors with no obvious issue in the query.
Also tried using $project to rename the fields, but got - Field names may not start with '$'
Thoughts on the best approach here? I can always address this at the application level and split my query into two but it's attractive potentially to solve it natively in mongo.
$group syntax is wrong
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
You tried to do
{
$group:
<expression>
}
And even if your expression resulted in the same code, its invalid syntax for $group (check from the documentation where you are allowed to use expressions)
One other problem is that you use the query operator for regex, and not the aggregate regex operators (you can't do that, if you aggregate you can use only aggregate operators, only $match is the exception that you can use both if you add $expr)
You need this i think
[{
"$group" : {
"_id" : {
"$cond" : [ {
"$not" : [ {
"$regexMatch" : {
"input" : "$upper.leaf",
"regex" : "/flower/"}}]},
{"leaf" : "$upper.leaf","stem" : "$upper.stem"},
{"stem" : "$upper.stem","petal" : "$upper.petal"}]
}
}}]
Its similar code, but expression gets as value of the "_id" and $regexMatch
is used that is aggregate operator.
I didnt tested the code.

MongoDB aggregation field with underscore

I have a field _type_ in my documents like this:
{
"name" : "0",
"_type_" : "product"
}
I need to do aggregation on that field:
db.readImport.aggregate([
{
$match: {
"$_type_": "product"
}
},
...
]);
When the field would not have underscores it would work but this way I get
unknown top level operator: $_type_
how can I access the field _type_ with $ ?
You don't need $ for $match stage:
db.readImport.aggregate([
{
$match: { "_type_": "product" }
},
...
]);
because $match stage accepts simple query as parameter. Other aggregation stages, such as $group, accept expressions. Expressions use filed path to access fields of input documents.

Converting some fields in Mongo from String to Array

I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.
I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.
How can I get every document that looks like the first example into the format for the second example?
{
"_id" : ObjectId("12345"),
"tags" : "red blue green white"
}
{
"_id" : ObjectId("54321"),
"tags" : [
"red",
"orange",
"black"
]
}
We can't use the $type operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:
When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.
But fortunately MongoDB also provides the $exists operator which can be used here with a numeric array index.
Now how can we update those documents?
Well, from MongoDB version <= 3.2, the only option we have is mapReduce() but first let look at the other alternative in the upcoming release of MongoDB.
Starting from MongoDB 3.4, we can $project our documents and use the $split operator to split our string into an array of substrings.
Note that to split only those "tags" which are string, we need a logical $condition processing to split only the values that are string. The condition here is $eq which evaluate to true when the $type of the field is equal to "string". By the way $type here is new in 3.4.
Finally we can overwrite the old collection using the $out pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project stage.
db.collection.aggregate(
[
{ "$project": {
"tags": {
"$cond": [
{ "$eq": [
{ "$type": "$tags" },
"string"
]},
{ "$split": [ "$tags", " " ] },
"$tags"
]
}
}},
{ "$out": "collection" }
]
)
With mapReduce, we need to use the Array.prototype.split() to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set the new value for "tags" using bulk operations using the bulkWrite() method new in 3.2 or the now deprecated Bulk() if we are on 2.6 or 3.0 as shown here.
db.collection.mapReduce(
function() { emit(this._id, this.tags.split(" ")); },
function(key, value) {},
{
"out": { "inline": 1 },
"query": {
"tags.0": { "$exists": false },
"tags": { "$type": 2 }
}
}
)['results']

Using $match on computed field Mongodb

I have this aggregation query in MongoDB:
db.questions.aggregate([
{ $project:{question:1,detail:1, choices:1, answer:1,
percent_false:{
$multiply:[100,{$divide:["$answear_false",{$add:["$answear_false","$answear_true"]}]}]},
percent_true:{
$multiply:[100,{$divide:["$answear_true",{$add:["$answear_false","$answear_true"]}]}]} }}, {$match:{status:'active'} }
]).pretty()
I want using $match on 2 computed fields "percent_true" and "percent_false" like this
$match : {percent_true:{$gte:20}}
How can i do ?
Singe the aggregation framework works in stages, you can treat the computed fields as if they were normal fields because from the $match's perspective, they are normal.
{ $project:{
question:1,detail:1, choices:1, answer:1,
percent_false:{
$multiply:[100,{$divide:["$answear_false",{$add:["$answear_false","$answear_true"]}]}]
},
percent_true:{
$multiply:[100,{$divide:["$answear_true",{$add:["$answear_false","$answear_true"]}]}]}
}
},
{$match:{
status:'active',
percent_true:{$gte:20}
//When documents get fed to match they already have a percent_true field, so you can match on them as normal
}
}

Get first element in array and return using Aggregate?

How can I get and return the first element in an array using a Mongo aggregation?
I tried using this code:
db.my_collection.aggregate([
{ $project: {
resp : { my_field: { $slice: 1 } }
}}
])
but I get the following error:
uncaught exception: aggregate failed: {
"errmsg" : "exception: invalid operator '$slice'",
"code" : 15999,
"ok" : 0
}
Note that 'my_field' is an array of 4 elements, and I only need to return the first element.
Since 3.2, we can use $arrayElemAt to get the first element in an array
db.my_collection.aggregate([
{ $project: {
resp : { $arrayElemAt: ['$my_field',0] }
}}
])
Currently, the $slice operator is unavailable in the the $project operation, of the aggregation pipeline.
So what you could do is,
First $unwind, the my_field array, and then group them together and take the $first element of the group.
db.my_collection.aggregate([
{$unwind:"$my_field"},
{$group:{"_id":"$_id","resp":{$first:"$my_field"}}},
{$project:{"_id":0,"resp":1}}
])
Or using the find() command, where you could make use of the $slice operator in the projection part.
db.my_collection.find({},{"my_field":{$slice:1}})
Update: based on your comments, Say you want only the second item in an array, for the record with an id, id.
var field = 2;
var id = ObjectId("...");
Then, the below aggregation command gives you the 2nd item in the my_field array of the record with the _id, id.
db.my_collection.aggregate([
{$match:{"_id":id}},
{$unwind:"$my_field"},
{$skip:field-1},
{$limit:1}
])
The above logic cannot be applied for more a record, since it would involve a $group, operator after $unwind. The $group operator produces a single record for all the records in that particular group making the $limit or $skip operators applied in the later stages to be ineffective.
A small variation on the find() query above would yield you the expected result as well.
db.my_collection.find({},{"my_field":{$slice:[field-1,1]}})
Apart from these, there is always a way to do it in the client side, though a bit costly if the number of records is very large:
var field = 2;
db.my_collection.find().map(function(doc){
return doc.my_field[field-1];
})
Choosing from the above options depends upon your data size and app design.
Starting Mongo 4.4, the aggregation operator $first can be used to access the first element of an array:
// { "my_field": ["A", "B", "C"] }
// { "my_field": ["D"] }
db.my_collection.aggregate([
{ $project: { resp: { $first: "$my_field" } } }
])
// { "resp" : "A" }
// { "resp" : "D" }
The $slice operator is scheduled to be made available in the $project operation in Mongo 3.1.4, according to this ticket: https://jira.mongodb.org/browse/SERVER-6074
This will make the problem go away.
This version is currently only a developer release and is not yet stable (as of July 2015). Expect this around October/November time.
Mongo 3.1.6 onwards,
db.my_collection.aggregate([
{
"$project": {
"newArray" : { "$slice" : [ "$oldarray" , 0, 1 ] }
}
}
])
where 0 is the start index and 1 is the number of elements to slice