How do I produce the union of embedded arrays in mongodb aggregate - mongodb

I have a set of documents of the form:
{
skill_id: 2,
skill_recs: [
{
_id: 4,
member_ids: [1, 4, 5]
}
]
},
{
skill_id: 5,
skill_recs: [
{
_id: 4,
member_ids: [1, 7, 9]
}
]
}
Now I want to aggregate a set of these documents such that skill_recs are combined by _id and the member_ids of all combined docs are merged into a single union of values...
{ _id: 4,
member_ids: [1, 4, 5, 7, 9]
}
I get most of the way with:
db.aggregate([
{
$unwind: '$skill_recs'
},
{
$group: {
_id: '$skill_recs._id',
all_member_ids: {$push: '$skill_recs.member_ids'}
}
},
{
$addFields: {
member_ids: {$setUnion: '$all_member_ids'}
}
}
])
but the $setUnion doesn't do a union of the array of arrays that it is passed.
Instead it produces:
{ _id: 4,
member_ids: [[1, 4, 5], [1, 7, 9]]
}
Any way to produce the union of these arrays?

You're quite close, Here's a quick example of how to achieve this using $reduce
db.collection.aggregate([
{
$unwind: "$skill_recs"
},
{
$group: {
_id: "$skill_recs._id",
all_member_ids: {
$push: "$skill_recs.member_ids"
}
}
},
{
$addFields: {
member_ids: {
$reduce: {
input: "$all_member_ids",
initialValue: [],
in: {
$setUnion: [
"$$this",
"$$value"
]
}
}
}
}
}
])
Mongo Playground

Related

How to use a value from another key in field path in MongoDB Aggregation?

Documents
{ color: 'red',
value: {
red: {
level1: {
level2: 5
}}}}
{ color: 'blue',
value: {
blue: {
level1: {
level2: 8
}}}}
How to aggregate the values of value.red.level1.level2 and value.blue.level1.level2?
The keys red and blue come from the key color.
#turivishal requested more info:
I want to use $bucket.
{ '$bucket': {
groupBy: '$value.*red*.level1.level2',
boundaries: [1,2,3,4,5,6,7,8,9],
output: {
count: { '$sum': 1 }}}}
The expected result would be
[{ id: 5, count: 1}, { id: 8, count: 1 }]
You can access it by converting it to an array of objects,
$objectToArray to convert an object to an array of objects that will convert in k (key) v (value) format
$arrayElemAt to get first element from an array, you can use it directly in $bucket's groupBy property
db.collection.aggregate([
{
$addFields: {
value: { $objectToArray: "$value" }
}
},
{
$bucket: {
groupBy: { $arrayElemAt: ["$value.v.level1.level2", 0] },
boundaries: [1, 2, 3, 4, 5, 6, 7, 8, 9],
output: {
count: { $sum: 1 }
}
}
}
])
Playground
In the second approach, you can use all operations in the direct $bucket's groupBy property using $let operator,
If you are not using projection stages before $bucket then you can use this approach to avoid the more stages
db.collection.aggregate([
{
$bucket: {
groupBy: {
$let: {
vars: { value: { $objectToArray: "$value" } },
in: { $arrayElemAt: ["$$value.v.level1.level2", 0] }
}
},
boundaries: [1, 2, 3, 4, 5, 6, 7, 8, 9],
output: {
count: { $sum: 1 }
}
}
}
])
Playground
I ended up using $addField and $ifNull.
$ifNull takes an array and returns the first value that is not null.

mongodb average arrays across many documents

Using mongodb, I have a collection of documents where each document has a fixed length vector of floating point values such as below:
items = [
{"id": "1", "vec": [1, 2, 0]},
{"id": "2", "vec": [6, 4, 1]},
{"id": "3", "vec": [3, 2, 2]},
]
I would like to take the row wise average of these vectors. In this example I would expect the result to return
[ (1 + 6 + 3) / 3, (2 + 4 + 2) / 3, (0 + 1 + 2) / 3 ]
This answer is very close to what I am looking for, but as far as I can tell it will only work on vectors of size 2. mongoDB - average on array values
An answer has been provided that is not very performant for large arrays. For context I am using ~700 dimension vectors.
This should work: https://mongoplayground.net/p/PKXqmmW31nW
[
{
$group: {
_id: null,
a: {
$push: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$push: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$push: {
$arrayElemAt: ["$vec", 2]
}
}
}
},
{
$project: {
a: {
$avg: "$a"
},
b: {
$avg: "$b"
},
c: {
$avg: "$c"
}
}
}
]
Which outputs:
[
{
"_id": null,
"a": 3.3333333333333335,
"b": 2.6666666666666665,
"c": 1
}
]
Here's a more efficient without $avg operator. I'll leave other answer up for reference.
https://mongoplayground.net/p/rVERc8YjKZv
db.collection.aggregate([
{
$group: {
_id: null,
a: {
$sum: {
$arrayElemAt: ["$vec", 0]
}
},
b: {
$sum: {
$arrayElemAt: ["$vec", 1]
}
},
c: {
$sum: {
$arrayElemAt: ["$vec", 2]
}
},
totalDocuments: {
$sum: 1
}
}
},
{
$project: {
a: {
$divide: ["$a", "$totalDocuments"]
},
b: {
$divide: ["$b", "$totalDocuments"]
},
c: {
$divide: ["$c", "$totalDocuments"]
}
}
}
])
You can use $unwind to get values into separate documents, the key is to keep the index of the values. Then you can use $group by the index and calculate the average using the $avg operator.
db.collection.aggregate([
{
$unwind: {
path: "$vec",
includeArrayIndex: "i" // unwind and keep index
}
},
{
$group: {
_id: "$i", // group by index
avg: { $avg: "$vec" }
}
}, // at this stage, you already get all the values you need, in separate documents. The following stages will put all the values in an array
{
$sort: { _id: 1 }
},
{
$group: {
_id: null,
avg: { $push: "$avg" }
}
}
])
Mongo Playground

Mongodb aggregation - count arrays with elements having integer value greater than

I need to write a MongoDB aggregation pipeline to count the objects having arrays containing two type of values:
>=10
>=20
This is my dataset:
[
{ values: [ 1, 2, 3] },
{ values: [12, 1, 3] },
{ values: [1, 21, 3] },
{ values: [1, 2, 29] },
{ values: [22, 9, 2] }
]
This would be the expected output
{
has10s: 4,
has20s: 3
}
Mongo's $in (aggregation) seems to be the tool for the job, except I can't get it to work.
This is my (non working) pipeline:
db.mytable.aggregate([
{
$project: {
"has10s" : {
"$in": [ { "$gte" : [10, "$$CURRENT"]}, "$values"]}
},
"has20s" : {
"$in": [ { "$gte" : [20, "$$CURRENT"]}, "$values"]}
}
},
{ $group: { ... sum ... } }
])
The output of $in seems to be always true. Can anyone help?
You can try something like this:
db.collection.aggregate([{
$project: {
_id: 0,
has10: {
$size: {
$filter: {
input: "$values",
as: "item",
cond: { $gte: [ "$$item", 10 ] }
}
}
},
has20: {
$size: {
$filter: {
input: "$values",
as: "item",
cond: { $gte: [ "$$item", 20 ] }
}
}
}
}
},
{
$group: {
_id: 1,
has10: { $sum: "$has10" },
has20: { $sum: "$has20" }
}
}
])
Using $project with $filter to get the actual elements and then via $size to get the array length.
See it working here

Nested array aggregation

How can I obtain the sum of all the first elements in each of the arrays in this document using the MongoDB aggregation framework?
{
items: {
item1: [5, 8, 2],
item2: [4, 3, 1],
...
}
}
Here is the partial pipeline I've tried:
[{
$addFields: {
itemsAsArray: {
$objectToArray: '$items'
}
}
}, {
$project: {
_id: 0,
itemsAsArray: 1
}
}, {
$unwind: '$itemsAsArray'
}]
So I take this items value from the document that was matched, convert it to an array, and unwind it. Now I have an array of objects like this:
[{
itemsAsArray: {
k: 'item1',
v: [5, 8, 2]
}
},
{
itemsAsArray: {
k: 'item2',
v: [4, 3, 1]
}
}, ... ]
With this array, how can I group them to yield the sum of the first element in each of the arrays now denoted as v? Am I on the right track?

Mongodb merging multiple rows base on computed condition on row value

I have a sample data like this:
[
{ objectId: 1, user: 1, phones: [1, 2], emails: ['a'] },
{ objectId: 2, user: 1, phones: [1, 5], emails: ['a', 'f'] },
{ objectId: 3, user: 1, phones: [8, 9], emails: ['f', 'g'] },
{ objectId: 4, user: 1, phones: [10], emails: ['h'] },
{ objectId: 5, user: 2, phones: [1, 2, 3], emails: ['aa', 'bb', cc'] },
]
Now I need to merge all related rows into one on these conditions:
Have same user
Have at least either one common phone or email
So output something like this:
[
{ objectId: 1, user: 1, phones: [1, 2, 5, 8, 9], emails: ['a', 'f', 'g'] },
{ objectId: 4, user: 1, phones: [10], emails: ['h'] },
{ objectId: 5, user: 2, phones: [1, 2, 3], emails: ['aa', 'bb', cc'] },
]
This is what I have came up with so far:
[
{
$unwind: {
path: "$phones",
preserveNullAndEmptyArrays: true
}
},
{
$group: {
_id: {
user: "$user",
phone: "$phones"
},
objectIds: {
$addToSet: "$_id"
},
emailsList: {
$push: "$emails"
},
user: { $first: "$user" },
phones: {
$first: "$phones"
}
}
},
{
"$addFields": {
"emails": {
"$reduce": {
"input": "$emailsList",
"initialValue": [],
"in": { "$setUnion": ["$$value", "$$this"] }
}
}
}
},
{
"$project": {
"emailsList": 0
}
},
{
$unwind: {
path: "$emails",
preserveNullAndEmptyArrays: true
}
},
{
$group: {
_id: {
user: "$user",
phone: "$emails"
},
objectIdsList: {
$push: "$objectIds"
}
}
},
{
"$project": {
"mergedObjectIds": {
"$reduce": {
"input": "$objectIdsList",
"initialValue": [],
"in": { "$setUnion": ["$$value", "$$this"] }
}
}
}
}
]
And then we have a list of objectIds need to be merged in, then I will merge it all in application code. So is there anyway I can do that in aggregation framework alone, or pipe the result of this aggregate in to the next one
Unless I'm missing something, these are just the "sets" for each user. So simply unwind both arrays and accumulate via $addToSet for each of "phones" and "emails":
db.collection.aggregate([
{ "$unwind": "$phones" },
{ "$unwind": "$emails" },
{ "$group": {
"_id": "$user",
"phones": { "$addToSet": "$phones" },
"emails": { "$addToSet": "$emails" }
}}
])
Which returns:
{ "_id" : 2, "phones" : [ 3, 2, 1 ], "emails" : [ "cc", "bb", "aa" ] }
{ "_id" : 1, "phones" : [ 9, 1, 2, 5, 8 ], "emails" : [ "g", "f", "a" ] }
A "set" is not really considered to be "ordered", so if you expect a certain order then you need to sort elsewhere, and probably best in the client.
Any "unique" id's don't really apply here. If anything you would use a different accumulator like $min or $max, or maybe $first depending on what you want, however the only relevant details I see here is the "user" for grouping and the other accumulated "set" values.
Even though unwinding multiple arrays produces a "cartesian product" of the other values, it really does not matter when everything being pulled out is as "distinct" values anyway. This typically only matters where you need to "count" elements, and that is something your output is not looking for in the question.