Find max element inside an array - mongodb

REF: MongoDB Document from array with field value max
Answers in Finding highest value from sub-arrays in documents and MongoDB find by max value in array of documents suggest to use sort + limit(1), however this is really slow. Surely there is a way to use the $max operator.
Suppose one gets a document like this in an aggregate match:
{
_id: "notImportant",
array: [
{
name: "Peter",
age: 17
},
{
name: "Carl",
age: 21
},
{
name: "Ben",
age: 15
}
]
}
And you want to find the (entire, not just the one value) document where age is highest.
How do you do that with the $max operator?
I tried
unwind {"$array"}
project {"_id": 0, "name": "$array.name", "age": "$array.age"}
so I get
{
_id: null,
name: "Peter",
age: 17
}
{
_id: null,
name: "Carl",
age: 21
}
{
_id: null,
name: "Ben",
age: 15
}
Then I tried matching age:
age: {$eq: {$max: "$age"}}
, but that gives me no results.
In other words what I need to get is the name and all other fields that belong to the oldest person in the array. And there are many thousands of persons, with lots of attributes, and on top of it all it runs on a raspberry pi. And I need to do this operation on a few dozen of such arrays. So with the sorting this takes about 40 seconds all in all. So I would really like to not use sort.

can you try this aggregation with $reduce
db.t63.aggregate([
{$addFields : {array : {$reduce : {
input : "$array",
initialValue : {age : 0},
in : {$cond: [{$gte : ["$$this.age", "$$value.age"]},"$$this", "$$value"]}}
}}}
])
output
{ "_id" : "notImportant", "array" : { "name" : "Carl", "age" : 21 } }

If you want all the documents which have the highest value, you should use a filter.
So basically, instead of using those unwind, project, etc, just use the below project stage
$project: {
age: {
$filter: {
input: "$array",
as: "item",
cond: { $eq: ["$$item.age", { $max: "$array.age" }] }
}
}
}

You can use below aggregation
db.collection.aggregate([
{ "$project": {
"max": {
"$arrayElemAt": [
"$array",
{
"$indexOfArray": [
"$array.age",
{ "$max": "$array.age" }
]
}
]
}
}}
])

Related

How to combine results of two MongoDB aggregation pipeline queries and perform another aggregation query on the combined result without using $facet?

My first query returns the following result, after various aggregation pipeline stages:
{
"group" : "A",
"count" : 6,
"total" : 20
},
{
"group" : "B",
"count" : 2,
"total" : 50
}
My second query returns the following result, after various aggregation pipeline stages:
{
"group": "A",
"count": 4,
"total": 80
},
{
"group": "C",
"count": 12,
"total": 60
}
Both the queries are performed on the same collection, but groups and transforms the data differently based on the pipeline stages.
Both of my queries use different $match conditions, have various pipeline stages including $facet,$unwind,$group,$project and operators like $map,$reduce,$zip,$subtract...
db.collection.aggregate([
{ $unwind...},
{ $match....},
{ $facet...},
...
])
When I use $facet to run my queries as parallel queries, it gives the following error (because I'm already using $facet in my existing queries) :
$facet is not allowed to be used within a $facet stage
Expected Output:
I need to find the average value for each of the group.
For that, I need to combine the results of both the queries and perform queries on the combined result.
My combined stage should look like this:
{
"group" : "A",
"count" : 10,
"total" : 100
},
{
"group" : "B",
"count" : 2,
"total" : 50
},
{
"group": "C",
"count": 12,
"total": 60
}
Expected final result with average value for each group:
{
"group" : "A",
"avg" : 10
},
{
"group" : "B",
"avg" : 25
},
{
"group": "C",
"avg": 5
}
Are there any operators available in MongoDB aggregation pipeline to achieve this without modifying my existing queries?
How to achieve this use case?
Thanks!
You can run your queries separately using $facet and then use below transformation to $group combined results by group and calculate the average:
db.collection.aggregate([
{
$facet: {
first: [ { $match: { "_": true } } ], // your first query
second: [ { $match: { "_": false } } ], // your second query
}
},
{
$project: {
all: {
$concatArrays: [ "$first", "$second" ]
}
}
},
{
$unwind: "$all"
},
{
$group: {
_id: "$all.group",
count: { $sum: "$all.count" },
total: { $sum: "$all.total" },
}
},
{
$project: {
_id: 0,
group: "$_id",
count: 1,
total: 1,
avg: { $divide: [ "$total", "$count" ] }
}
}
])
Mongo Playground
Note: I'm using the _ character to indicate which query the document comes from. Obviously you don't need it and you can replace $facet queries with your own
There are different approaches to combine results including $merge,which was introduced in 4.2
But I used the following approach since I use version 4.0
Save both of the aggregation query result in a variable and then insert it into a new temporary collection:
var result = db.collection.aggregate(...); //query1 here
db.temp.insert(result.toArray());
var result = db.collection.aggregate(...); //query2 here
db.temp.insert(result.toArray());
// Find out average
db.temp.aggregate([
{
$group: {
_id: "$group",
count: { $sum: "$count" },
total: { $sum: "$total" }
}
},
{
$project: {
_id: 0,
group: "$_id",
avg: { $divide: [ "$total", "$count" ] }
}
}
]).pretty()

Query multiple properties in at the same time getting an overall average and an array

Given the following data, I'm trying to get an average of all their ages, at the same time I want to return an array of their names. Ideally, I want to do this in just one query but can't seem to figure it out.
Data:
users:[
{user:{
id: 1,
name: “Bob”,
age: 23
}},
{user:{
id: 1,
name: “Susan”,
age: 32
}},
{user:{
id: 2,
name: “Jeff”,
age: 45
}
}]
Query:
var dbmatch = db.users.aggregate([
{$match: {"id" : 1}},
{$group: {_id: null, avg_age: { $avg: "$age" }}},
{$group: {_id : { name: "$name"}}}
)]
Running the above groups one at a time outputs the results I expect, either an _id of null and an average of 27.5, or an array of the names.
When I combine them as you see above using a comma, I get:
Issue Generated Code:
[ { _id: {name: null } } ]
Expected Generated Code:
[
{name:"Bob"},
{name:"Susan"},
avg_age: 27.5
]
Any help would be greatly appreciated!
Not sure if this is exactly what you want, but this query
db.users.aggregate([
{
$match: {
id: 1
}
},
{
$group: {
_id: "$id",
avg_age: {
$avg: "$age"
},
names: {
$push: {
name: "$name"
}
}
}
},
{
$project: {
_id: 0
}
}
])
Results in this result:
[
{
"avg_age": 27.5,
"names": [
{
"name": "Bob"
},
{
"name": "Susan"
}
]
}
]
This will duplicate names, so if there are two documents with the name Bob, it will be two times in the array. If you don't want duplicates, change $push to $addToSet.
Also, if you want names to be just an array of names instead of objects, change names query to
names: {
$push: "$name"
}
This will result in
[
{
"avg_age": 27.5,
"names": ["Bob", "Susan"]
}
]
Hope it helps,
Tomas :)
You can use $facet aggregation to run the multiple queries at once
db.collection.aggregate([
{ "$facet": {
"firstQuery": [
{ "$match": { "id": 1 }},
{ "$group": {
"_id": null,
"avg_age": { "$avg": "$age" }
}}
],
"secondQuery": [
{ "$match": { "id": 1 }},
{ "$group": { "_id": "$name" }}
]
}}
])

Customize existing document and add new fields in mongo aggregation

I have two document with following structure
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "John",
"Age" : 25
},
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "Tom",
"Age" : 24
}
In those two document collegename is the common fields, by using that I want generate following format of a single document
{
"CollegeName" : "Hi-Tech College",
"JohnAge" : 25,
"TomAge" : 24
}
You can try below aggregation:
db.col.aggregate([
{
$group: {
_id: null,
CollegeName: { $first: "$CollegeName" },
Students: { $push: { k: { $concat: [ "$StudentName", "Age" ] }, v: "$Age" } }
}
},
{
$replaceRoot: {
newRoot: { $mergeObjects: [ { CollegeName: "$CollegeName" }, { $arrayToObject: "$Students" } ] }
}
}
])
Basically to create key names dynamically you can use $arrayToObject operator which takes an array of key-value pairs (k and v properties) and returns an object. To create your custom keys you can use $concat. Then you have to "merge" new dynamically created object with CollegeName so you can use $mergeObjects and $replaceRoot operators for that.
Since it's grouping by null which returns one document for entire collection you have to keep in mind that MongoDB has BSON document size limit, so your result can't exceed 16MB. More here.
db.temp.aggregate([
{"$group":{"_id":{"CollegeName":"$CollegeName"},
"Students":{"$push":{"StudentName":"$StudentName","Age":"$Age"}}}}
,{"$unwind":"$Students"}
,,{"$group":{"_id":"$_id",
'JohnAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'John']}
]}
, '$Students.Age', null] } },
'TomAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'Tom']}
]}
, '$Students.Age', null] } }
}}
])

Empty array prevents document to appear in query

I have documents that have a few fields and in particular the have a field called attrs that is an array. I am using the aggregation pipeline.
In my query I am interested in the attrs (attributes) field if there are any elements in it. Otherwise I still want to get the result. In this case I am after the field type of the document.
The problem is that if a document does not contain any element in the attrs field it will be filtered away and I won't get its _id.type field, which is what I really want from this query.
{
aggregate: "entities",
pipeline: [
{
$match: {
_id.servicePath: {
$in: [
/^/.*/,
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
]
}
So the question is: how can I get a result containing all documents types regardless of their having attrs, but including the attributes in case they have them?
I hope it makes sense.
You can use the $cond operator in a $project stage to replace the empty attr array with one that contains a placeholder like null that can be used as a marker to indicate that this doc doesn't contain any attr elements.
So you'd insert an additional $project stage like this right before the $unwind:
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
The only caveat is that you'll end up with a null value in the final attrs array for those groups that contain at least one doc without any attrs elements, so you need to ignore those client-side.
Example
The example uses an altered $match stage because the one in your example isn't valid.
Input Docs
[
{_id: {type: 1, id: 2}, attrs: []},
{_id: {type: 2, id: 1}, attrs: []},
{_id: {type: 2, id: 2}, attrs: [{name: 'john', type: 22}, {name: 'bob', type: 44}]}
]
Output
{
"result" : [
{
"_id" : 1,
"attrs" : [
null
]
},
{
"_id" : 2,
"attrs" : [
{
"name" : "bob",
"type" : 44
},
{
"name" : "john",
"type" : 22
},
null
]
}
],
"ok" : 1
}
Aggregate Command
db.test.aggregate([
{
$match: {
'_id.servicePath': {
$in: [
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
])
use some if statements and loops.
first, your query should select all documents, first and foremost.
loop through all of them
then, if number of attributes is greater than 0, loop through the attributes. loop them into whatever array or output you find useful.
use if statements to sanitize your results if you like.
You should use '$or' operator , and two seperate queries : one to select the documents with attr value equal to required value, and other query to match documents where attr is null, or attr key does not exist ( using $exists operator )

Can the MongoDB aggregation framework $group return an array of values?

How flexible is the aggregate function for output formatting in MongoDB?
Data format:
{
"_id" : ObjectId("506ddd1900a47d802702a904"),
"port_name" : "CL1-A",
"metric" : "772.0",
"port_number" : "0",
"datetime" : ISODate("2012-10-03T14:03:00Z"),
"array_serial" : "12345"
}
Right now I'm using this aggregate function to return an array of DateTime, an array of metrics, and a count:
{$match : { 'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}
},
{$project : { port_name : 1, metric : 1, datetime: 1}},
{$group : { _id : "$port_name",
datetime : { $push : "$datetime"},
metric : { $push : "$metric"},
count : { $sum : 1}}}
Which is nice, and very fast, but is there a way to format the output so there's one array per datetime/metric? Like this:
[
{
"_id" : "portname",
"data" : [
["2012-10-01T00:00:00.000Z", 1421.01],
["2012-10-01T00:01:00.000Z", 1361.01],
["2012-10-01T00:02:00.000Z", 1221.01]
]
}
]
This would greatly simplify the front-end as that's the format the chart code expects.
Combining two fields into an array of values with the Aggregation Framework is possible, but definitely isn't as straightforward as it could be (at least as at MongoDB 2.2.0).
Here is an example:
db.metrics.aggregate(
// Find matching documents first (can take advantage of index)
{ $match : {
'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}},
// Project desired fields and add an extra $index for # of array elements
{ $project: {
port_name: 1,
datetime: 1,
metric: 1,
index: { $const:[0,1] }
}},
// Split into document stream based on $index
{ $unwind: '$index' },
// Re-group data using conditional to create array [$datetime, $metric]
{ $group: {
_id: { id: '$_id', port_name: '$port_name' },
data: {
$push: { $cond:[ {$eq:['$index', 0]}, '$datetime', '$metric'] }
},
}},
// Sort results
{ $sort: { _id:1 } },
// Final group by port_name with data array and count
{ $group: {
_id: '$_id.port_name',
data: { $push: '$data' },
count: { $sum: 1 }
}}
)
MongoDB 2.6 made this a lot easier by introducing $map, which allows a simplier form of array transposition:
db.metrics.aggregate([
{ "$match": {
"array_serial": array,
"port_name": { "$in": ports},
"datetime": { "$gte": from, "$lte": to }
}},
{ "$group": {
"_id": "$port_name",
"data": {
"$push": {
"$map": {
"input": [0,1],
"as": "index",
"in": {
"$cond": [
{ "$eq": [ "$$index", 0 ] },
"$datetime",
"$metric"
]
}
}
}
},
"count": { "$sum": 1 }
}}
])
Where much like the approach with $unwind, you supply an array as "input" to the map operation consisting of two values and then essentially replace those values with the field values you want via the $cond operation.
This actually removes all the pipeline juggling required to transform the document as was required in previous releases and just leaves the actual aggregation to the job at hand, which is basically accumulating per "port_name" value, and the transformation to array is no longer a problem area.
Building arrays in the aggregation framework without $push and $addToSet is something that seems to be lacking. I've tried to get this to work before, and failed. It would be awesome if you could just do:
data : {$push: [$datetime, $metric]}
in the $group, but that doesn't work.
Also, building "literal" objects like this doesn't work:
data : {$push: {literal:[$datetime, $metric]}}
or even data : {$push: {literal:$datetime}}
I hope they eventually come up with some better ways of massaging this sort of data.