MongoDB aggregation group - mongodb

I need few values to pass through the pipeline(without changes) to the subsequent pipeline aggregator while using a group
For eg if my input is
{ "_id" : 1, "tags" : [ "a", "b", "b", "c" ], "text" : "a" },
{ "_id" : 2, "tags" : [ "a", "c" ], "text" : "b" }
and I like the output to be
{ _id: 1, tags : [a,b,c], text : a} , { _id: 2, tags : [a,c], text : b}
I did an unwind and group and do $addToSet to remove the dups. My question is how do I make
text appear in the output as is. the text key and values need to just pass through this pipeline, However i am forced to use accumulators
I tried this code so far
use test
var unwind = { $unwind : "$tags"};
var group = { $group : { _id : "$_id" , tags : { $addToSet : "$tags" }
, text : { $first : "$text" }}};
db.test.aggregate(unwind,group)
and it works fine but that is not the intention of $first and it is suggested to be used with sort. What is the right way to do this ?

There's nothing wrong with using $first for this. As an alternative, you could add text to your _id, but that's hardly better as then you'd need to add a $project to the end of your pipeline to move it back out of _id.
You have to do the same sort of mildly awkward thing with SQL group bys.

All fields but the _id field must use an accumulator with $group, as specified in the $group documentation page. The problem is that it's unclear what it would mean to "pass" a value through the pipeline with $group, since MongoDB has no way of knowing that the value of that field is going to be the same for all documents you are grouping together. You have to choose what value you want for that field with the accumulator.

Related

Sorting on index of array mongodb

I have a collection where i have objects like:
{
"_id" : ObjectId("5ab212249a639865c58b744e"),
"levels" : [
{
"levelId" : 0,
"siteId" : "5a0ff11dc7bd083ea6a706b1",
"title" : "Hospital Services"
},
{
"levelId" : 1,
"siteId" : "5a0ff220c7bd083ea6a706d0",
"title" : "Reference Testing"
},
{
"levelId" : 2,
"siteId" : "5a0ff24fc7bd083ea6a706da",
"title" : "Des Moines(Reference Testing)"
}
]
}
I want to sort on the title field of 2nd object of levels array e.g. levels.2.title
Currently my mongo query looks like:
db.getCollection('5aaf63a69a639865c58b2ab9').aggregate([
{$sort : {'levels.2.title':1}}
])
But it is not giving desired results.
Please help.
You can try below query in 3.6.
db.col.aggregate({$sort:{"levels.2.title":1}});
This aggregation and find semantics are different in 3.4. More on jira here
So
db.col.find().sort({"levels.2.title":1})
works as expected and aggregation sort is not working as expected.
Use below aggregation in 3.4.
Use $arrayElemAt to project the second element in $addFields to keep the computed value as the extra field in the document followed by $sort sort on field.
$project with exclusion to drop the sort field to get expected output.
db.col.aggregate([
{"$addFields":{ "sort_element":{"$arrayElemAt":["$levels", 2]}}},
{"$sort":{"sort_element.title":-1}},
{"$project":{"sort_element":0}}
])
Also, You can use $let expression to output the title field directly in $addFields stage.
db.col.aggregate([
{"$addFields":{ "sort_field":{"$let:{"vars":{"ele":{$arrayElemAt":["$levels", 2]}}, in:"$$ele.title"}}}},
{"$sort":{"sort_field":-1}},
{"$project":{"sort_field":0}}
])

removing object from nested array of objects mongodb

I've got collection with volunteer information in it, and it lists the volunteers as an array of objects. I can display all the shifts for each volunteer, but removing one from the array is proving difficult for me:
Sample data:
"_id" : ObjectId("59180305c19dbaa4ecd9ee59"),
"where" : "Merchandise tent",
"description" : "Sell gear at the merchandise tent.",
"shifts" : [
{
"dateNeeded" : ISODate("2017-06-23T00:00:00Z"),
"timeslot" : "8:00 - NOON",
"needed" : 2,
"_id" : ObjectId("591807546a71c3a57d1a2105"),
"volunteers" : [
{
"fullname" : "Mary Mack",
"phone" : "1234567890",
"email" : "mary#gmail.com",
"_id" : ObjectId("591ce45bc7e8a8c7b742474c")
}
]
},
The data I have available for this is:
_id, where, shifts.timeslot, shifts.dateNeeded, volunteers.email
Can someone help me? Lets say Mary Mack wants to unVolunteer for the 8 - Noon shift at the merchandise tent. She may be listed under other shifts as well, but we only want to remove her from this shift.
You can do this by specifying something to match the "document" and then the required "shifts" array entry as the query expression for an .update(). Then apply the positional $ operator for the matched array index with $pull:
db.collection.update(
{ "_id": ObjectId("59180305c19dbaa4ecd9ee59"), "shifts.timeslot": "8:00 - NOON" },
{ "$pull": { "shifts.$.volunteers": { "fullname": "Mary Mack" } } }
)
That is okay in this instance since you are only trying to "match" on the "outer" array in the nested structure and the $pull has query arguments of it's own to identify the array entry to remove.
You really should be careful using "nested arrays" though. As whilst a $pull operation like this works, updates to the "inner" array are not really possible since the positional $ operator will only match the "first" element that meets the condition. So your example of "Mary Mack" in multiple shifts would only ever match in the first "shifts" array entry found.
Try this
db.example.update(
{},
{ $unset: {"Mary Mack":1}},
false, true
)

$Avg aggregation in Mongodb [duplicate]

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

Average a Sub Document Field Across Documents in Mongo

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});