Given a collection of documents that each has an array property ks:
{
_id: ObjectId('...'),
ks: [4, 3, 2, 1, 3],
v: 45
},
{
_id: ObjectId('...'),
ks: [3, 3, 5],
v: 21
},
{
_id: ObjectId('...'),
ks: [1, 5, 2, 8, 9, 7],
v: 12
}
How can I aggregate this collection to a list using key = min ks or other fold functions?
[
{
_id: 1,
v: 28.5 // = mean [45, 12]
},
{
_id: 3,
v: 21 // = mean [21]
}
]
Grouping using the keyf function works
keyf: function(d) { d.ks.reduce(function(acc, a) { return acc<a ? acc : a; }) }
But is there a way to do this with aggregation pipeline?
It seems that you want the minimum $min value of ks for your aggregation key and the $avg of "v" for each min ks. You need to $unwind "ks" first.
You also need to $group your data twice, once for finding the min of ks and the next time for calculating the avg of v.
db.collection.aggregate([
// Unwind the array
{ "$unwind": "$ks" },
// Find the minimal key per document
{ "$group": {
"_id": "$_id",
"ks": { "$min": "$ks" },
"v": { "$first": "$v" }
}},
// Group with the average value
{ "$group": {
"_id": "$ks",
"v": { "$avg": "$v" }
}},
// Group does not sort results
{ "$sort": { "_id": 1 } }
])
Results in:
[
{
"_id" : 1,
"v" : 28.5
},
{
"_id" : 3,
"v" : 21
}
]
Related
Will try to keep this concise with the input, result and desired/expected result. Need to find the minimum, maximum number of rows/records between the same "winCode" and the last time it occurred in the ordered data. So it makes me want to first group them by "winCode" which works perfectly, but I am not able to come up with something that would display how many records it took for the same "winCode" to appear last time, the minimum and maximum. Check desired output for more details. Below is the paste from: https://mongoplayground.net/p/bCzTO8ZLxNi
Input/collection
[
{
code: "1",
results: {
winCode: 3
}
},
{
code: "10",
results: {
winCode: 3
}
},
{
code: "8",
results: {
winCode: 2
}
},
{
code: "5",
results: {
winCode: 5
}
},
{
code: "5",
results: {
winCode: 4
}
},
{
code: "6",
results: {
winCode: 4
}
},
{
code: "7",
results: {
winCode: 5
}
},
{
code: "3",
results: {
winCode: 3
}
},
{
code: "9",
results: {
winCode: 2
}
},
{
code: "2",
results: {
winCode: 2
}
}
]
Current query
db.collection.aggregate([
{
$sort: {
code: -1
}
},
{
$group: {
_id: "$results.winCode",
count: {
$sum: 1
},
lastTimeOccurredCode: {
$first: "$code" // Any way to get it to display a count from the start to this point on how many records it went through to get the $first result?
},
}
},
{
$sort: {
_id: -1
}
},
])
Current output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredCode": "1"
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredCode": "2"
}
]
Desired output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredRecordsCount": 4,
"minRecordsBetween": 3,
"maxRecordsBetween": 3
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredRecordsCount": 5,
"minRecordsBetween": 1,
"maxRecordsBetween": 1
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredRecordsCount": 1,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredRecordsCount": 3,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
}
]
I have tried to add an $accumulator function, but I would need the $first functions result in it, but it's not available at the same $group stage. Feel like I am missing something here.
You can use $setWindowFields to define index and reduce to find the diff between them. If you want the index to be according to {$sort: {code: -1}}, then keep the $setWindowFields sortBy according to this example and remove the redundant {$sort: {code: -1}} step. If you want the index to be according to another sorting logic that only update the $setWindowFields sortBy.
Use $setWindowFields to define index
$sort according to your what you need (if it is different than the prev sort)
$group according to the $results.winCode and keep all index data.
Calculate the diff
Format
db.collection.aggregate([
{$setWindowFields: {
sortBy: {code: -1},
output: {index: {$sum: 1, window: {documents: ["unbounded", "current"]}}}
}},
{$sort: {code: -1}},
{$group: {
_id: "$results.winCode",
count: {$sum: 1},
lastTimeOccurredCode: {$first: "$code"},
index: {$push: "$index"}
}},
{$project: {
count: 1,
lastTimeOccurredCode: 1,
diff: {
$reduce: {
input: {$range: [1, {$size: "$index"}]},
initialValue: [],
in: {$concatArrays: [
"$$value",
[{$subtract: [
{$arrayElemAt: ["$index", "$$this"]},
{$arrayElemAt: ["$index", {$subtract: ["$$this", 1]}]}
]}]
]
}
}
}
}},
{$set: {
minRecordsBetween: {$min: "$diff"},
maxRecordsBetween: {$max: "$diff"},
diff: "$$REMOVE"
}},
{$sort: {_id: -1}}
])
See how it works on the playground example
I have the following documents in my db:
{uid: 1, score: 10}
{uid: 2, score: 11}
{uid: 3, score: 1}
{uid: 4, score: 6}
{uid: 5, score: 2}
{uid: 6, score: 3}
{uid: 7, score: 8}
{uid: 8, score: 10}
I want to split them into buckets by score - i.e.:
score
uids
(bucket name in aggregation)
[0,4)
3,5,6
0
[4,7)
4
4
[7,inf
1,2,7,8
7
For this, I created the following aggregation which works just fine:
db.scores.aggregation(
[
{
$bucket:
{
groupBy: "$score",
boundaries: [0, 4, 7],
default: 7,
output:
{
"total": {$sum: 1},
"top_frustrated":
{
$push: {
"uid": "$uid", "score": "$score"
}
},
},
}
},
]
)
However, I would like to return only the top 3 of every bucket - i.e, buckets 0, 4 should be the same, but bucket 7 should have only uids 1,2,8 returned (as uid 7 has the lowest score) - but to include the total count of documents as well, i.e. output of bucket "7" should look like:
{ "total" : 4, "top_scores" :
[
{"uid" : 2, "score" : 11},
{"uid" : 1, "score" : 10},
{"uid" : 8, "score" : 10},
]
}
I tried using $addFields with $sortArray and $slice, but it either won't work or return errors.
I can of course use $project but I was wondering if there is a more efficient way.
I am using Amazon DocumentDB.
You can use the $topN accumulator, instead of $push, like this:
db.collection.aggregate([
{
"$bucket": {
"groupBy": "$score",
"boundaries": [
0,
4,
7
],
"default": 7,
"output": {
"total": {
"$sum": 1
},
"top_frustrated": {
"$topN": {
"n": 3,
"sortBy": {
"score": -1
},
"output": {
"uid": "$uid",
"score": "$score"
}
}
}
},
}
},
])
Playground link.
The only catch here is this operator is present in MongoDB 5.2 and above.
For older versions, this will work:
db.collection.aggregate([
{
"$sort": {
score: -1
}
},
{
$bucket: {
groupBy: "$score",
boundaries: [
0,
4,
7
],
default: 7,
output: {
"total": {
$sum: 1
},
"top_frustrated": {
$push: {
"uid": "$uid",
"score": "$score"
}
},
},
}
},
{
"$project": {
total: 1,
top_frustrated: {
"$slice": [
"$top_frustrated",
3
]
}
}
}
])
Playground link.
Using mongodb, I have a collection of documents where each document has a fixed length vector of floating point values such as below:
items = [
{"id": "1", "vec": [1, 2, 0]},
{"id": "2", "vec": [6, 4, 1]},
{"id": "3", "vec": [3, 2, 2]},
]
I would like to take the row wise average of these vectors. In this example I would expect the result to return
[ (1 + 6 + 3) / 3, (2 + 4 + 2) / 3, (0 + 1 + 2) / 3 ]
This answer is very close to what I am looking for, but as far as I can tell it will only work on vectors of size 2. mongoDB - average on array values
An answer has been provided that is not very performant for large arrays. For context I am using ~700 dimension vectors.
This should work: https://mongoplayground.net/p/PKXqmmW31nW
[
{
$group: {
_id: null,
a: {
$push: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$push: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$push: {
$arrayElemAt: ["$vec", 2]
}
}
}
},
{
$project: {
a: {
$avg: "$a"
},
b: {
$avg: "$b"
},
c: {
$avg: "$c"
}
}
}
]
Which outputs:
[
{
"_id": null,
"a": 3.3333333333333335,
"b": 2.6666666666666665,
"c": 1
}
]
Here's a more efficient without $avg operator. I'll leave other answer up for reference.
https://mongoplayground.net/p/rVERc8YjKZv
db.collection.aggregate([
{
$group: {
_id: null,
a: {
$sum: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$sum: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$sum: {
$arrayElemAt: ["$vec", 2]
}
},
totalDocuments: {
$sum: 1
}
}
},
{
$project: {
a: {
$divide: ["$a", "$totalDocuments"]
},
b: {
$divide: ["$b", "$totalDocuments"]
},
c: {
$divide: ["$c", "$totalDocuments"]
}
}
}
])
You can use $unwind to get values into separate documents, the key is to keep the index of the values. Then you can use $group by the index and calculate the average using the $avg operator.
db.collection.aggregate([
{
$unwind: {
path: "$vec",
includeArrayIndex: "i" // unwind and keep index
}
},
{
$group: {
_id: "$i", // group by index
avg: { $avg: "$vec" }
}
}, // at this stage, you already get all the values you need, in separate documents. The following stages will put all the values in an array
{
$sort: { _id: 1 }
},
{
$group: {
_id: null,
avg: { $push: "$avg" }
}
}
])
Mongo Playground
Using MongoChef aggregation, if you have data such as:
{_id: 1, Mnt: 2016-05-01, Score: 85}
{_id: 2, Mnt: 2016-05-01, Score: 85}
{_id: 3, Mnt: 2016-03-01, Score: 80}
{_id: 4, Mnt: 2016-03-01, Score: 80}
{_id: 5, Mnt: 2016-03-01, Score: 80}
{_id: 6, Mnt: 2016-01-01, Score: 75}
and want to:
Calculate max month in the collection (i.e. M1 : May 2016),
Group by "Mnt" - which might not be sequential latest months, e.g. collection above latest/largest 3 months being: 2016-May, 2016-March, 2016-January,
Find the latest X month totals,
Calculate the Average of each,
e.g.
{M1 : 85, M2 : 82, M3 : 80.8}
I.e.
M1 is average of max month in collection,
M2 is average of max 2 project months in collection
M3 is average of max 3 project months in collection etc.
this is a dirty solution, but will give you an overview how to start:
var i = 1;
var elemSum = 0;
var elemCount = 0;
db.a.aggregate([{
$group : {
_id : {
year : {
$year : "$Mnt"
},
month : {
$month : "$Mnt"
}
},
avg : {
$avg : "$Score"
},
elemCount : {
$sum : 1
},
elemSum : {
$sum : "$Score"
}
}
}, {
$sort : {
"_id.year" : -1,
"_id.month" : -1
}
},
{
$limit : 3
}, // first 3 records
]).forEach(function (doc) {
elemSum += doc.elemSum;
elemCount += doc.elemCount;
var result = elemSum / elemCount;
var x = "M" + i.toString() + ": ";
print(x + result.toString());
i++;
})
and I converted month field to iso time
db.a.insert([
{_id: 1, Mnt: new ISODate("2016-05-01T15:44:00.255Z"), Score: 85},
{_id: 2, Mnt: new ISODate("2016-05-01T15:44:00.255Z"), Score: 85},
{_id: 3, Mnt: new ISODate("2016-03-01T15:44:00.255Z"), Score: 80},
{_id: 4, Mnt: new ISODate("2016-03-01T15:44:00.255Z"), Score: 80},
{_id: 5, Mnt: new ISODate("2016-03-01T15:44:00.255Z"), Score: 80},
{_id: 6, Mnt: new ISODate("2016-01-01T15:44:00.255Z"), Score: 75}
])
Code that works - calculate a running 12-month and current month Net Promoter Scores:
db.Collection.aggregate(
// Pipeline
// Stage 1
{
$project: {
ID: "$ID",
Mnt: "$Mnt",
CntryReg: "$CntryReg",
Prom: "$Prom",
}
},
// Stage 2
{
$group: {
_id: '$Mnt',
docs: {
$push: {
Mnt: "$Mnt",
CntryReg: "$CntryReg",
Prom: "$Prom"
}}
}
},
// Stage 3
{
$sort: {
_id: -1
}
},
// Stage 4
{
$limit: 12
},
// Stage 5
{
$group: {
"_id": null,
"values": { "$push": "$docs" }
}
},
// Stage 6
{
$unwind: {
"path": "$values", "includeArrayIndex": "rank"
}
},
// Stage 7
{
$unwind: "$values"
},
// Stage 8
{
$project: {
_id: 0,
Mnt: "$values.Mnt",
CntryReg: "$values.CntryReg",
Prom: "$values.Prom",
rank: "$rank"
}
},
// Stage 9
{
$group: {
_id: {CntryReg:"$CntryReg"} ,
AR12: { $sum: { $cond : [{ $eq : ["$Prom", "D"]}, 1, 0]} },
Ind12: { $sum: { $cond : [{ $eq : ["$Prom", "I"]}, 1, 0]} },
Loy12: { $sum: { $cond : [{ $eq : ["$Prom", "P"]}, 1, 0]} },
Sum12: {$sum: 1 },
AR1: { $sum: { $cond : [{ $and : [{ $eq : ["$Prom", "D"]} , {$eq : ["$rank", 0]} ]}, 1, 0]} },
Loy1: { $sum: { $cond : [{ $and : [{ $eq : ["$Prom", "P"]} , {$eq : ["$rank", 0]} ]}, 1, 0]} },
Ind1: { $sum: { $cond : [{ $and : [{ $eq : ["$Prom", "I"]} , {$eq : ["$rank", 0]} ]}, 1, 0]} },
Sum1: { $sum: { $cond : [ { $eq : ["$rank", 0]}, 1, 0]} },
I've got documents with this simplified schema :
{
positon: 10,
value: 5,
count: 3
}
What I'd like to compute, is to group those documents by position and find the maximum value where the count is greater than 4 but with value less than the minimum value where the count is less than 4.
Here what I've done, but it does not work :
{ $group: {
_id: {
position: "$position",
},
result: {$max: { $cond: [ {$and: [ {$gte: ["$count", 4]},
{$lt: ["$value", {$min: { $cond: [ {$lt: ["$count", 4]},
{ value: "$value" },
10]
}
}]
}]},
{ value: "$value", nb: "$count"},
0]
}
}
}
}
I am said that $minis an invalid operator and I cant figure out how to write the right aggregation function. Would it be better to run a mapreduce ?
If for example I have those documents
{Position: 10, value: 1, count 5}
{Position: 10, value: 3, count 3}
{Position: 10, value: 4, count 5}
{Position: 10, value: 7, count 4}
I'd like the reslt to be
{Position: 10, value: 1, count 4}
As it is the maximum of 'value' where count is greater than 4 but also as there is a value of 3 that has only 3 counts so that the value 4 is not what I'm looking for.
That is a bit of a mouthful to say the least but I'll have another crack at explaining it:
You want:
For each "Position" value find the document whose "value" is less than the the largest "value" of the document with a "count" of less than four, whose own "count" is actually greater than 4.
Which reads like a math exam problem designed to confuse you with the logic. But catching that meaning then you perform the aggregation with the following steps:
db.positions.aggregate([
// Separate the values greater than and less than 4 by "Position"
{ "$group": {
"_id": "$Position",
"high": { "$push": {
"$cond": [
{ "$gt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}},
"low": { "$push": {
"$cond": [
{ "$lt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}}
}},
// Unwind the "low" counts array
{ "$unwind": "$low" },
// Find the "$max" value from the low counts
{ "$group": {
"_id": "$_id",
"high": { "$first": "$high" },
"low": { "$min": "$low.value" }
}},
// Unwind the "high" counts array
{ "$unwind": "$high" },
// Compare the value to the "low" value to see if it is less than
{ "$project": {
"high": 1,
"lower": { "$lt": [ "$high.value", "$low" ] }
}},
// Sorting, $max won't work over multiple values. Want the document.
{ "$sort": { "lower": -1, "high.value": -1 } },
// Group, get the highest order document which was on top
{ "$group": {
"_id": "$_id",
"value": { "$first": "$high.value" },
"count": { "$first": "$high.count" }
}}
])
So from the set of documents:
{ "Position" : 10, "value" : 1, "count" : 5 }
{ "Position" : 10, "value" : 3, "count" : 3 }
{ "Position" : 10, "value" : 4, "count" : 5 }
{ "Position" : 10, "value" : 7, "count" : 4 }
Only the first is returned in this case as it's value is less than the "count of three" document where it's own count is greater than 4.
{ "_id" : 10, "value" : 1, "count" : 5 }
Which I am sure is what you actually meant.
So the application of $min and $max really only applies when getting discrete values from documents out of a grouping range. If you are interested in more than one value from the document or indeed the whole document, then you are sorting and getting the $first or $last entries on the grouping boundary.
And aggregate is much faster than mapReduce as it uses native code without invoking a JavaScript interpreter.