mongodb average arrays across many documents - mongodb

Using mongodb, I have a collection of documents where each document has a fixed length vector of floating point values such as below:
items = [
{"id": "1", "vec": [1, 2, 0]},
{"id": "2", "vec": [6, 4, 1]},
{"id": "3", "vec": [3, 2, 2]},
]
I would like to take the row wise average of these vectors. In this example I would expect the result to return
[ (1 + 6 + 3) / 3, (2 + 4 + 2) / 3, (0 + 1 + 2) / 3 ]
This answer is very close to what I am looking for, but as far as I can tell it will only work on vectors of size 2. mongoDB - average on array values
An answer has been provided that is not very performant for large arrays. For context I am using ~700 dimension vectors.

This should work: https://mongoplayground.net/p/PKXqmmW31nW
[
{
$group: {
_id: null,
a: {
$push: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$push: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$push: {
$arrayElemAt: ["$vec", 2]
}
}
}
},
{
$project: {
a: {
$avg: "$a"
},
b: {
$avg: "$b"
},
c: {
$avg: "$c"
}
}
}
]
Which outputs:
[
{
"_id": null,
"a": 3.3333333333333335,
"b": 2.6666666666666665,
"c": 1
}
]
Here's a more efficient without $avg operator. I'll leave other answer up for reference.
https://mongoplayground.net/p/rVERc8YjKZv
db.collection.aggregate([
{
$group: {
_id: null,
a: {
$sum: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$sum: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$sum: {
$arrayElemAt: ["$vec", 2]
}
},
totalDocuments: {
$sum: 1
}
}
},
{
$project: {
a: {
$divide: ["$a", "$totalDocuments"]
},
b: {
$divide: ["$b", "$totalDocuments"]
},
c: {
$divide: ["$c", "$totalDocuments"]
}
}
}
])

You can use $unwind to get values into separate documents, the key is to keep the index of the values. Then you can use $group by the index and calculate the average using the $avg operator.
db.collection.aggregate([
{
$unwind: {
path: "$vec",
includeArrayIndex: "i" // unwind and keep index
}
},
{
$group: {
_id: "$i", // group by index
avg: { $avg: "$vec" }
}
}, // at this stage, you already get all the values you need, in separate documents. The following stages will put all the values in an array
{
$sort: { _id: 1 }
},
{
$group: {
_id: null,
avg: { $push: "$avg" }
}
}
])
Mongo Playground

Related

How can I exclude results that contain a specific element from grouped results?

A: It should be output how many _ids are included by date grouped by date.
B: The number of elements in details in A.
If it has element, count 1. not 0. If the document is as follows, the value counted after excluding from A becomes B
{
_id: ObjectId
details: array //no elements
createdAt: Date
}
C: The count of B becomes C, except when there are specific details.slaesManagerIds among B.
details.salesManagerIds is provided as an array.
For examples,
[ObjecttId("612f57184205db63a3396a9e"), ObjectId("612cb021278f621a222087d7")]
I made query as follows.
https://mongoplayground.net/p/6sBxAmO_31y
It goes well until B. How can I write a query to get C ?
If you write and execute a query that can obtain C through the link above, you should get the following result.
[
{
"A": 2,
"B": 1,
"C": 1,
"_id": "2018-05-19"
},
{
"A": 3,
"B": 3,
"C": 1,
"_id": "2018-05-18"
}
]
use $filter
db.collection.aggregate([
{
$group: {
_id: {
$dateToString: {
format: "%Y-%m-%d",
date: "$createdAt"
}
},
A: {
$sum: 1
},
B: {
$sum: {
$cond: [
{
$and: [
{
$isArray: "$details"
},
{
$gt: [
{
$size: "$details"
},
0
]
}
]
},
1,
0
]
}
},
C: {
$sum: {
$cond: [
{
$and: [
{
$isArray: "$details"
},
{
$gt: [
{
$size: "$details"
},
0
]
},
{
$gt: [
{
$size: {
$filter: {
input: "$details",
as: "d",
cond: {
$and: [
{
$not: [
{
$in: [
"$$d.salesManagerId",
[
ObjectId("612f57184205db63a3396a9e"),
ObjectId("612cb021278f621a222087d7")
]
]
}
]
}
]
}
}
}
},
0
]
}
]
},
1,
0
]
}
}
}
},
{
$sort: {
_id: -1
}
}
])
mongoplayground

Convert array to new field, using keys as the values of this array and values as frequency of these items (aggregation framework)

I have this problem, but I can't solve it.
I have to transform the array s to a new field called shares.
This new field have inside new keys and new values.
Suppose I have these documents:
{
'name': 'igor',
's': ['a', 'a', 'a', 'b', 'b']
},
{
'name': 'jones',
's': ['c', 'b']
}
Expected output:
{
'name': 'igor',
'shares': {
'a': 3
'b': 2
}
},
{
'name': 'jones',
'shares': {
'c': 1
'b': 1
}
}
You can try below aggregation query :
db.collection.aggregate([
/** unwind `s` array */
{
$unwind: "$s"
},
/** group on unique pairs of `_id + s` & retain name field, count sum of matching docs */
{
$group: { _id: { k: "$s", _id: "$_id" }, name: { $first: "$name" }, v: { $sum: 1 } }
},
/** group on unique pairs of just `_id` & retain name field, push docs into shares array `[{k :..., v:...}]` */
{
$group: { _id: "$_id._id", name: { $first: "$name" }, shares: { $push: { k: "$_id.k", v: "$v" } } }
},
/** Re-create shares field from array to object */
{
$addFields: { shares: { $arrayToObject: "$shares" } }
}
])
Test : mongoplayground
It's a bad practice to add heterogeneous elements (in your case: 'a': 3, 'b': 2) to an array, I converted shares's type to something like:
{
key: "$_id.shares",
count: "$count"
}
You need to do the following in order:
Unwind the array s.
Group by composite _ids name and s.
Again group by _id _id.name and push objects of type key and count to the shares array.
You can try the below query:
db.collection.aggregate([
{
$unwind: "$s"
},
{
$group: {
_id: {
name: "$name",
shares: "$s"
},
count: {
$sum: 1
}
}
},
{
$group: {
_id: "$_id.name",
shares: {
$push: {
key: "$_id.shares",
count: "$count"
}
}
}
}
])
Output
[
{
"_id": "jones",
"shares": [
{
"count": 1,
"key": "c"
},
{
"count": 1,
"key": "b"
}
]
},
{
"_id": "igor",
"shares": [
{
"count": 3,
"key": "a"
},
{
"count": 2,
"key": "b"
}
]
}
]
MongoPlayGroundLink

MongoDB aggregations. Get value differences

I've been struggling with mongo trying to find a solution to show the differences between values.
I have values like this:
[
{val: 1},
{val: 4},
{val: 7},
{val: 8},
{val: 11}
]
And I want to receive something like this:
[
{diff: 3},
{diff: 3},
{diff: 1},
{diff: 3}
]
Every value is evaluated by taking the next one (4) and subtracting the previous one (1). After all this, we receive 3 in output, which is located in the second list as the first item.
Is it possible to achieve it using MongoDB aggregations?
You need to group them into array, calculate diff and flatten again.
Pseudocode
//We $group here all values
var _data = [{val: 1}, {val: 4}, ..., {val: 11}];
//With $range operator we get nÂș of items
// We ensure even items, since odd items will return null as last item
var idx = [0, 1, 2, ..., n];
//Here we store diff items with $map operator
var data = [];
//$map
for (var i in idx) {
data[i] = _data[i+1] - _data[i];
}
//$unwind
{data:[0]}, {data[1]}, {data[2]}, ...
//$replaceRoot
{
data:{ {
diff : 3 --> diff : 3
} }
}
Add these steps into your pipeline:
db.collection.aggregate([
{
$group: {
_id: null,
data: { $push: "$$ROOT" }
}
},
{
$addFields: {
data: {
$map: {
input: {
$range: [
0,
{
$subtract: [
{ $size: "$data" },
{ $mod: [ { $size: "$data" }, 2 ] }
]
},
1
]
},
as: "idx",
in: {
diff: {
$subtract: [
{
$arrayElemAt: [
"$data.val",
{
$add: [ "$$idx", 1 ]
}
]
},
{
$arrayElemAt: [ "$data.val", "$$idx" ]
}
]
}
}
}
}
}
},
{
$unwind: "$data"
},
{
$replaceRoot: {
newRoot: "$data"
}
}
])
MongoPlayground

Mongodb aggregation - count arrays with elements having integer value greater than

I need to write a MongoDB aggregation pipeline to count the objects having arrays containing two type of values:
>=10
>=20
This is my dataset:
[
{ values: [ 1, 2, 3] },
{ values: [12, 1, 3] },
{ values: [1, 21, 3] },
{ values: [1, 2, 29] },
{ values: [22, 9, 2] }
]
This would be the expected output
{
has10s: 4,
has20s: 3
}
Mongo's $in (aggregation) seems to be the tool for the job, except I can't get it to work.
This is my (non working) pipeline:
db.mytable.aggregate([
{
$project: {
"has10s" : {
"$in": [ { "$gte" : [10, "$$CURRENT"]}, "$values"]}
},
"has20s" : {
"$in": [ { "$gte" : [20, "$$CURRENT"]}, "$values"]}
}
},
{ $group: { ... sum ... } }
])
The output of $in seems to be always true. Can anyone help?
You can try something like this:
db.collection.aggregate([{
$project: {
_id: 0,
has10: {
$size: {
$filter: {
input: "$values",
as: "item",
cond: { $gte: [ "$$item", 10 ] }
}
}
},
has20: {
$size: {
$filter: {
input: "$values",
as: "item",
cond: { $gte: [ "$$item", 20 ] }
}
}
}
}
},
{
$group: {
_id: 1,
has10: { $sum: "$has10" },
has20: { $sum: "$has20" }
}
}
])
Using $project with $filter to get the actual elements and then via $size to get the array length.
See it working here

Get max from unwound arrays

I have a collection of documents where I want to find the maximum values of each of the ratios of every possible pair of fields in the data object. For example:
Documents:
[
{ data: { a: 1, b: 5, c: 2 } },
{ data: { a: 4, b: 1, c: 1 } },
{ data: { a: 2, b: 4, c: 3 } }
]
Desired output:
{
a: { a: 1, b: 4, c: 4 },
b: { a: 5, b: 1, c: 2.5 },
c: { a: 2, b: 1, c: 1 }
}
So the output a.b is the largest of the a:b ratios 1/5, 4/1, and 2/4.
So I figure I first use $objectToArray to convert data, then $unwind on the result, but I'm having a hard time figuring out how to group everything together. The number of documents I have won't be too large, but the number of keys in data can be in the low thousands, so I'm not sure how well Mongo will be able to handle doing a bunch of $lookup's and comparing the values like that.
You can try following aggregation:
db.col.aggregate([
{
$addFields: { data: { $objectToArray: "$data" } }
},
{
$project: {
pairs: {
$map: {
input: { $range: [ 0, { $multiply: [ { $size: "$data" }, { $size: "$data" } ] } ] },
as: "index",
in: {
$let: {
vars: {
leftIndex: { $floor: { $divide: [ "$$index", { $size: "$data" } ] } },
rightIndex: { $mod: [ "$$index", { $size: "$data" } ] }
},
in: {
l: { $arrayElemAt: [ "$data", "$$leftIndex" ] },
r: { $arrayElemAt: [ "$data", "$$rightIndex" ] }
}
}
}
}
}
}
},
{ $unwind: "$pairs" },
{
$group: {
_id: { l: "$pairs.l.k", r: "$pairs.r.k" },
value: { $max: { $divide: [ "$pairs.l.v", "$pairs.r.v" ] } }
}
},
{
$sort: {
"_id.l": 1, "_id.r": 1
}
},
{
$group: {
_id: "$_id.l",
values: { $push: { k: "$_id.r", v: "$value" } }
}
},
{
$addFields: { values: { $arrayToObject: "$values" } }
},
{
$project: {
root: [ { k: "$_id", v: "$values" } ]
}
},
{
$sort: { "root.k": 1 }
},
{
$replaceRoot: {
newRoot: {
$arrayToObject: "$root"
}
}
}
])
Basically you need $objectToArray and $arrayToObject to transform between arrays and objects. Basically the point is that for each object you need to generate nxn pairs (3x3=9 in this case). You can perform such iteration using $range operator. Then using $mod and $divide with $floor you can get index pairs like (0,0)...(2,2). Then you just need $group with $max to get max values for each pair type (like a with b and so on). To get final shape you also need $replaceRoot.
Outputs:
{ "a" : { "a" : 1, "b" : 4, "c" : 4 } }
{ "b" : { "a" : 5, "b" : 1, "c" : 2.5 } }
{ "c" : { "a" : 2, "b" : 1, "c" : 1 } }