Given the following documents:
{ "_id" : ObjectId("585901b7875bab86885cf54f"), "foo" : 24, "bar" : [ 1, 2, 5, 6 ] }
{ "_id" : ObjectId("585901be875bab86885cf550"), "foo" : 42, "bar" : [ 3, 4 ] }
I want to get all the unique values in the bar field, something like:
{"_id": "something", "bar": [1, 2, 3, 4, 5, 6]}
This is what I tried:
db.stuff.aggregate([{
$group: {
_id: null,
bar: {
$addToSet: {$each: "$bar"}
}
}
}])
But complains that $each is not a recognized operator.
This does work:
db.stuff.aggregate([{
$group: {
_id: null,
bar: {
$addToSet: "$bar"
}
}
}])
But obviously produces a wrong result:
{ "_id" : null, "bar" : [ [ 3, 4 ], [ 1, 2, 5, 6 ] ] }
EDIT
I managed to have the result I want by adding a first $unwind stage:
db.stuff.aggregate([{
$unwind: { "$bar" },
$group: {
_id: null,
bar: {
$addToSet: "$bar"
}
}
}])
=> { "_id" : null, "bar" : [ 4, 3, 5, 2, 6, 1 ] }
Is it possible at all to make it in one single pipeline stage?
The distinct() works with array field as well so will beautifully do this.
db.stuff.distinct('bar')
The aggregation framework is overkill for this and will not perform well
Related
I am checking the data types of one field in several documents that belong to a collection. Fortunately MongoDB has documentation related in that topic MongoLink. The problem is that I do not understant the output of the aggregation operation.
This is the collection
{ _id: 0, a : 8 }
{ _id: 1, a : [ 41.63, 88.19 ] }
{ _id: 2, a : { a : "apple", b : "banana", c: "carrot" } }
{ _id: 3, a : "caribou" }
{ _id: 4, a : NumberLong(71) }
{ _id: 5 }
this is the aggregation operation
db.coll.aggregate([{
$project: {
a : { $type: "$a" }
}
}])
and the result is
{ _id: 0, "a" : "double" }
{ _id: 1, "a" : "array" }
{ _id: 2, "a" : "object" }
{ _id: 3, "a" : "string" }
{ _id: 4, "a" : "long" }
{ _id: 5, "a" : "missing" }
The bits that I do not understant: (i) is the plain letter "a" not the $a in the aggregation function and (ii) the letter "a" that is in the result which is guess is not related with the $a in the aggregation function.
Best Regards
With a MongoDB collection test containing the following documents:
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
if I sort them in reversed order based on the second element in the items array, using
db.test.find().sort({"items.1": -1})
they will be correctly sorted as:
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
However, when I attempt to sort them using the aggregate function:
db.test.aggregate([{$sort: {"items.1": -1} }])
They will not sort correctly, even though the query is accepted as valid:
{
"result" : [
{
"_id" : 1,
"color" : "blue",
"items" : [
1,
2,
0
]
},
{
"_id" : 2,
"color" : "red",
"items" : [
0,
3,
4
]
}
],
"ok" : 1
}
Why is this?
The aggregation framework just does not "deal with" arrays in the same way as is applied to .find() queries in general. This is not only true of operations like .sort(), but also with other operators, and namely $slice, though that example is about to get a fix ( more later ).
So it pretty much is impossible to deal with anything using the "dot notation" form with an index of an array position as you have. But there is a way around this.
What you "can" do is basically work out what the "nth" array element actually is as a value, and then return that as a field that can be sorted:
db.test.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"items": { "$push": "$items" },
"itemsCopy": { "$push": "$items" },
"first": { "$first": "$items" }
}},
{ "$unwind": "$itemsCopy" },
{ "$project": {
"items": 1,
"itemsCopy": 1,
"first": 1,
"seen": { "$eq": [ "$itemsCopy", "$first" ] }
}},
{ "$match": { "seen": false } },
{ "$group": {
"_id": "$_id",
"items": { "$first": "$items" },
"itemsCopy": { "$push": "$itemsCopy" },
"first": { "$first": "$first" },
"second": { "$first": "$itemsCopy" }
}},
{ "$sort": { "second": -1 } }
])
It's a horrible and "iterable" approach where you essentially "step through" each array element by getting the $first match per document from the array after processing with $unwind. Then after $unwind again, you test to see if that array elements are the same as the one(s) already "seen" from the identified array positions.
It's terrible, and worse for the more positions you want to move along, but it does get the result:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "itemsCopy" : [ 3, 4 ], "first" : 0, "second" : 3 }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "itemsCopy" : [ 2, 0 ], "first" : 1, "second" : 2 }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "itemsCopy" : [ 1, 5 ], "first" : 2, "second" : 1 }
Fortunately, upcoming releases of MongoDB ( as currently available in develpment releases ) get a "fix" for this. It may not be the "perfect" fix that you desire, but it does solve the basic problem.
There is a new $slice operator available for the aggregation framework there, and it will return the required element(s) of the array from the indexed positions:
db.test.aggregate([
{ "$project": {
"items": 1,
"slice": { "$slice": [ "$items",1,1 ] }
}},
{ "$sort": { "slice": -1 } }
])
Which produces:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "slice" : [ 3 ] }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "slice" : [ 2 ] }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "slice" : [ 1 ] }
So you can note that as a "slice", the result is still an "array", however the $sort in the aggregation framework has always used the "first position" of the array in order to sort the contents. That means that with a singular value extracted from the indexed position ( just as the long procedure above ) then the result will be sorted as you expect.
The end cases here are that is just how it works. Either live with the sort of operations you need from above to work with a indexed position of the array, or "wait" until a brand new shiny version comes to your rescue with better operators.
My collection contains documents of the following form:
{
'game_name': 'football',
'scores': [4,1,2,7,6,0,5]
}
How do I find the 'scores' of all such objects in the collection and sort them in ascending order?
If I understand your Question
db.yourcollection.aggregate([
{
$unwind:"$scores"
},{
$sort:{
"scores":1
}
},{
$group:{
_id:null,
scores:{$addToSet:"$scores"}
}
},{
$project:{
_id:0,
scores:1
}
}
])
Result is:
{
"result" : [
{
"scores" : [
7,
5,
4,
6,
2,
1,
0
]
}
],
"ok" : 1
}
I have a document similar to the following, from which I want to return the sub-fields of the current top level field as the top level fields in every document of the results array:
{
field1: {
subfield1: {},
subfield2: [],
subfield3: 44,
subfield5: xyz
},
field2: {
othercontent: {}
}
}
I want the results of my aggregation query to return the following (the contents of field1 as the top level document):
{
subfield1: {},
subfield2: [],
subfield3: 44,
subfield5: xyz
}
Can this be done with $project and the aggregation framework without defining every sub fields to return as a top level field?
You can use $replaceRoot aggregation operator since 3.4:
db.getCollection('sample').aggregate([
{
$replaceRoot: {newRoot: "$field1"}
}
])
Provides output as expected:
{
"subfield" : {},
"subfield2" : [],
"subfield3" : 44,
"subfield5" : "xyz"
}
It's generally hard to make MongoDB deal with ambiguous or parameterized json keys. I ran into a similar issue and the best solution was to modify the schema so that the members of the subdocument became elements in an array.
However, I think this will get you close to what you want (all code should run directly in the Mongo shell). Assuming you have documents like:
db.collection.insert({
"_id": "doc1",
"field1": {
"subfield1": {"key1": "value1"},
"subfield2": ["a", "b", "c"],
"subfield3": 1,
"subfield4": "a"
},
"field2": "other content"
})
db.collection.insert({
"_id": "doc2",
"field1": {
"subfield1": {"key2": "value2"},
"subfield2": [1, 2, 3],
"subfield3": 2,
"subfield4": "b"
},
"field2": "yet more content"
})
Then you can run an aggregation command that promotes the content of field1 while ignoring the rest of the document:
db.collection.aggregate({
"$group":{
"_id": "$_id",
"value": {"$push": "$field1"}
}})
This makes all the subfield* keys into top-level fields of an object, and that object is the only element in an array. It's clumsy, but workable:
"result" : [
{
"_id" : "doc2",
"value" : [
{
"subfield1" : {"key2" : "value2"},
"subfield2" : [1, 2, 3],
"subfield3" : 2,
"subfield4" : "b"
}
]
},
{
"_id" : "doc1",
"value" : [
{
"subfield1" : {"key1" : "value1"},
"subfield2" : ["a","b","c"],
"subfield3" : 1,
"subfield4" : "a"
}
]
}
],
"ok" : 1
Starting Mongo 4.2, the $replaceWith aggregation operator can be used to replace a document by another (in our case by a sub-document) as syntaxic sugar for $replaceRoot:
// { field1: { a: 1, b: 2, c: 3 }, field2: { d: 4, e: 5 } }
// { field1: { a: 6, b: 7 }, field2: { d: 8 } }
db.collection.aggregate({ $replaceWith: "$field1" })
// { a: 1, b: 2, c: 3 }
// { a: 6, b: 7 }
I have a list of employees, each who belong to a department and a company.
An employee also has a salary history. The last value is their current salary.
Example:
{
name: "Programmer 1"
employee_id: 1,
dept_id: 1,
company_id: 1,
salary: [50000,50100,50200]
},
{
name: "Programmer 2"
employee_id: 2,
dept_id: 1,
company_id: 1,
salary: [50000,50200,50300]
},
{
name: "Manager"
employee_id: 3,
dept_id: 2,
company_id: 1,
salary: [60000,60500,61000]
},
{
name: "Contractor (different company)"
employee_id: 4,
dept_id: 1,
company_id: 2,
salary: [60000,60500,75000]
}
I want to find the current average salary for employees, grouped by dept_id and company_id.
Something like:
db.employees.aggregate(
{ $project : { employee_id: 1, dept_id: 1, company_id: 1, salaries: 1}},
{ $unwind : "$salaries" },
{
"$group" : {
"_id" : {
"dept_id" : "$dept_id",
"company_id" : "$company_id",
},
current_salary_avg : { $avg : "$salaries.last()" }
}
}
);
In this case it would be
Company 1, Group 1: 50250
Company 1, Group 2: 61000
Company 2, Group 1: 75000
I've seen examples doing something similar with $unwind, but I'm struggling with getting the last value of salary. Is $slice the correct operator in this case, and if so how do I use it with project?
In this case you need to set up your pipeline as follows :
unwind the salary list to get all the salaries for each employee
group by employee, dept and company and get the last salary
group by dept and company and get the average salary
The code for this aggregation pipeline is :
use test;
db.employees.aggregate( [
{$unwind : "$salary"},
{
"$group" : {
"_id" : {
"dept_id" : "$dept_id",
"company_id" : "$company_id",
"employee_id" : "$employee_id",
},
"salary" : {$last: "$salary"}
}
},
{
"$group" : {
"_id" : {
"company_id" : "$_id.company_id",
"dept_id" : "$_id.dept_id",
},
"current_salary_avg" : {$avg: "$salary"}
}
},
{$sort :
{
"_id.company_id" : 1,
"_id.dept_id" : 1,
}
},
]);
Assuming that you have imported the data with:
mongoimport --drop -d test -c employees <<EOF
{ name: "Programmer 1", employee_id: 1, dept_id: 1, company_id: 1, salary: [50000,50100,50200]}
{ name: "Programmer 2", employee_id: 2, dept_id: 1, company_id: 1, salary: [50000,50200,50300]}
{ name: "Manager", employee_id: 3, dept_id: 2, company_id: 1, salary: [60000,60500,61000]}
{ name: "Contractor (different company)", employee_id: 4, dept_id: 1, company_id: 2, salary: [60000,60500,75000]}
EOF
Now you can use $slice in aggregation. To return elements from either the start or end of the array: { $slice: [ <array>, <n> ] }
To return elements from the specified position in the array: { $slice: [ <array>, <position>, <n> ] }.
And a couple of examples from the mongo page:
{ $slice: [ [ 1, 2, 3 ], 1, 1 ] } // [ 2 ]
{ $slice: [ [ 1, 2, 3 ], -2 ] } // [ 2, 3 ]
{ $slice: [ [ 1, 2, 3 ], 15, 2 ] } // [ ]
{ $slice: [ [ 1, 2, 3 ], -15, 2 ] } // [ 1, 2 ]