MongoDB Aggregation SUM Array of Arrays by object key - mongodb

Okay, so I've been searching for a while but couldn't find an answer to this, and I am desperate :P
I have some documents with this syntax
{
"period": ISODate("2018-05-29T22:00:00.000+0000"),
"totalHits": 13982
"hits": [
{
// some fields...
users: [
{
// some fields...
userId: 1,
products: [
{ productId: 1, price: 30 },
{ productId: 2, price: 30 },
{ productId: 3, price: 30 },
{ productId: 4, price: 30 },
]
},
]
}
]
}
And I want to retrieve a count of how many products (Independently of which user has them) we have on a period, an example output would be like this:
[
{
"period": ISODate("2018-05-27T22:00:00.000+0000"),
"count": 432
},
{
"period": ISODate("2018-05-28T22:00:00.000+0000"),
"count": 442
},
{
"period": ISODate("2018-05-29T22:00:00.000+0000"),
"count": 519
}
]
What is driving me crazy is the "object inside an array inside an array" I've done many aggregations but I think they were simpler than this one, so I am a bit lost.
I am thinking about changing our document structure to a better one, but we have ~6M documents which we would need to transform to the new one and that's just a mess... but Maybe it's the only solution.
We are using MongoDB 3.2, we can't update our systems atm (I wish, but not possible).

You can use $unwind to expand your array, then use $group to sum:
db.test.aggregate([
{$match: {}},
{$unwind: "$hits"},
{$project: {_id: "$_id", period: "$period", users: "$hits.users"}},
{$unwind: "$users"},
{$project: {_id: "$_id", period: "$period", subCout: {$size: "$users.products"}}},
{$group: {"_id": "$period", "count": {$sum: "$count"}}}
])

Related

Filter nested objects

I have a collection of docs like
{'id':1, 'score': 1, created_at: ISODate(...)}
{'id':1, 'score': 2, created_at: ISODate(...)}
{'id':2, 'score': 1, created_at: ISODate(...)}
{'id':2, 'score': 20, created_at: ISODate(...)}
etc.
Does anyone know how to find docs that were created within the past 24hrs where the difference of the score value between the two most recent docs of the same id is less than 5?
So far I can only find all docs created within the past 24hrs:
[{
$project: {
_id: 0,
score: 1,
created_at: 1
}
}, {
$match: {
$expr: {
$gte: [
'$created_at',
{
$subtract: [
'$$NOW',
86400000
]
}
]
}
}
}]
Any advice appreciated.
Edit: By the two most recent docs, the oldest of the two can be created more than 24hrs ago. So the most recent doc would be created within the past 24hrs, but the oldest doc could be created over 24hrs ago.
If I understand you correctly, you want something like:
db.collection.aggregate([
{$match: {$expr: {$gte: ["$created_at", {$subtract: ["$$NOW", 86400000]}]}}},
{$sort: {created_at: -1}},
{$group: {_id: "$id", data: {$push: "$$ROOT"}}},
{$project: {pair: {$slice: ["$data", 0, 2]}, scores: {$slice: ["$data.score", 0, 2]}}},
{$match: {$expr: {
$lte: [{$abs: {$subtract: [{$first: "$scores"}, {$last: "$scores"}]}}, 5]
}}},
{$unset: "scores"}
])
See how it works on the playground example
EDIT:
according to you comment, one option is:
db.collection.aggregate([
{$setWindowFields: {
partitionBy: "$id",
sortBy: {created_at: -1},
output: {data: {$push: "$$ROOT", window: {documents: ["current", 1]}}}
}},
{$group: {
_id: "$id",
created_at: {$first: "$created_at"},
pair: {$first: "$data"}
}},
{$match: {$expr: {$and: [
{$gte: ["$created_at", {$dateAdd: {startDate: "$$NOW", unit: "day", amount: -1}},
{$eq: [{$size: "$pair"}, 2]},
{$lte: [{$abs: {$subtract: [{$first: "$pair.score"},
{$last: "$pair.score"}]}}, 5]}
]}}},
{$project: {_id: 0, pair: 1}}
])
See how it works on the playground example
If I've understood correctly you can try this query:
First the $match as you have to get documents since a day ago.
Then $sort by the date to ensure the most recent are on top.
$group by the id, and how the most recent were on top, using $push will be the two first elements in the array.
So now you only need to $sum these two values.
And filter again with these one that are less than ($lt) 5.
db.collection.aggregate([
{
$match: {
$expr: {
$gte: [
"$created_at",
{
$subtract: [
"$$NOW",
86400000
]
}
]
}
}
},
{
"$sort": {
"created_at": -1
}
},
{
"$group": {
"_id": "$id",
"score": {
"$push": "$score"
}
}
},
{
"$project": {
"score": {
"$sum": {
"$firstN": {
"n": 2,
"input": "$score"
}
}
}
}
},
{
"$match": {
"score": {
"$lt": 5
}
}
}
])
Example here
Edit: $firstN is new in version 5.2. Other way you can use $slice in this way.

Mongo: Average on each position of a nested array for multiple documents

I'm recieving an array of documents, each document has the data of some participants of a study.
"a" has some anatomic metrics, here represented as "foo" and "bar". (i.e. height, weight, etc.)
"b" has the performance per second on other tests:
"t" is the time in seconds and
"e" are the tests results mesured at that specific time. (i.e. cardiac rithm, blood pressure, temperature, etc. )
Example of data:
[
{
"a": { "foo":1, "bar": 100 },
"b": [
{ "t":1, "e":[3,4,5] },
{ "t":2, "e":[4,4,4] },
{ "t":3, "e":[7,4,7] }
],
},
{
"a": { "foo":2, "bar": 111 },
"b": [
{ "t":1, "e":[9,4,0] },
{ "t":2, "e":[1,4,2] },
{ "t":3, "e":[3,4,5] }
],
},
{
"a": { "foo":4, "bar": 200 },
"b": [
{ "t":1, "e":[1,4,2] },
{ "t":2, "e":[3,1,3] },
{ "t":3, "e":[2,4,1] }
],
}
]
I'm trying to get some averages of the participants.
I already manage to get the averages of the anatomic values stored in "a".
I used:
db.collection.aggregate([
{
$group: {
_id: null,
barAvg: {
$avg: {
$avg: "$a.bar"
}
}
}
}
])
However, I'm failing to get the average of every test per second. So that would be the average on every "t" of every individual element of "e".
Expected result:
"average": [
{ "t":1, "e":[4.33, 3.00, 2.33] },
{ "t":2, "e":[2.66, 3.00, 3.00] },
{ "t":3, "e":[4.33, 3.00, 5.00] }
]
Here, 4.33 is the average of every first test ( e[0] ), but just of the fisrt second ( t=1 ), of every person.
One option is to $unwind to separate for the documents according to their t value and use $zip to transpose it before calculating the average:
db.collection.aggregate([
{$unwind: "$b"},
{$group: {_id: "$b.t", data: {$push: "$b.e"}}},
{$set: {data: {$zip: {inputs: [
{$arrayElemAt: ["$data", 0]},
{$arrayElemAt: ["$data", 1]},
{$arrayElemAt: ["$data", 2]}
]
}
}
}
},
{$project: {
t: "$_id",
e: {$map: {input: "$data", in: {$trunc: [{$avg: "$$this"}, 2]}}}
}
},
{$sort: {t: 1}},
{$group: {_id: 0, average: {$push: {t: "$t", e: "$e"}}}},
{$unset: "_id"}
])
See how it works on the playground example - zip
Other option may be to $unwind twice and build the entire calculation from pieces, but the advantage is that you don't need to literally specify the number of items in each e array for the $arrayElemAt:
db.collection.aggregate([
{$project: {b: 1, t: 1, _id: 0}},
{$unwind: "$b"},
{$unwind: {path: "$b.e", includeArrayIndex: "index"}},
{$group: {_id: {t: "$b.t", index: "$index"}, data: {$push: "$b.e"}}},
{$sort: {"_id.index": 1}},
{$group: {_id: "$_id.t", average: {$push: {$avg: "$data"}}}},
{$sort: {_id: 1}},
{$group: {_id: 0, average: {$push: {t: "$_id", e: "$average"}}}},
{$unset: "_id"}
])
See how it works on the playground example - unwind twice

MongoDB collect / aggregate time series into an array

Following the examples I have two types of data in the same time series
db.weather.insertMany( [
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temp": 72
},//....
and..
db.weather.insertMany([
{
"metadata": {"sensorId": 5578, "type": "humidity" },
"timestamp": ISODate("2021-05018T00:00:001Z"),
"humpercent": 78
},//...
and I want to be able to serve simple requests by aggregating the data as:
{
sensorId: 5578,
humidityData: [78, 77, 75 ...],
tempData: [72, 72, 71...]
}
which seems like the obvious use case, but the
db.foo.aggregate([{$group: {_id: "$sensorId"}}])
function on sensorId only returns the ids with no other fields. am i missing a simple identity aggregation function or a way to collect into an array?
What you are looking for is the $addToSet Operator:
db.foo.aggregate([{
$group: {
_id: "$metadata.sensorId",
temp: {
$addToSet: "$temp"
},
humidity: {
$addToSet: "$humpercent"
}
}
}])
Note that the order of elements in the returned array is not specified.
If all you have is two categories, you can simply $push them:
db.collection.aggregate([
{$sort: {timestamp: 1}},
{$group: {
_id: {sensorId: "$metadata.sensorId"},
temp: {$push: "$temp"},
humidity: {$push: "$humpercent"}
}
}
])
See how it works on the playground example - small
But if you want the generic solution for multiple measurements you need something like:
db.collection.aggregate([
{$sort: {timestamp: 1}},
{$set: {m: "$$ROOT"}},
{$unset: ["m.metadata", "m.timestamp", "m._id"]},
{$set: {m: {$first: {$objectToArray: "$m"}}}},
{$group: {
_id: {type: "$metadata.type", sensorId: "$metadata.sensorId"},
data: {$push: "$m.v"}}
},
{$project: {_id: 0, data: 1, type: {k: "type", v: "$_id.type"}, sensorId: "$_id.sensorId"}},
{$group: {
_id: "$sensorId",
data: {$push: {k: "$type.v", v: "$data"}}
}},
{$project: {_id: 0, data: {"$mergeObjects": [{$arrayToObject: "$data"}, {sensorId: "$_id"}]}
}},
{$replaceRoot: {newRoot: "$data"}}
])
See how it works on the playground example - generic

Return object of multiple counts

I'm trying to query a document and return a single object that lists three counts of 3 different scenarios in a single key. Here's how the collection is organized:
{
response_main:{
type: Schema.ObjectId,
ref: "topics"
},
response_in: {
type: Number
}
}
The response_in key needs to be sorted as to whether it is a 0,1 or 2. The way I am currently solving this problem is:
Collection.aggregate([
{$match: {response_main: mainTopic._id}},
{$group: {
_id: {
$cond: {if: {$eq: ["$response_in", 0]}, then: "agree", else:{
$cond: {if: {$eq: ["$response_in", 1]}, then: "neutral", else:{
$cond: {if: {$eq: ["$response_in", 2]}, then: "disagree", else:false}
}
}
}, count: {$sum: 1}
}}
], callback);
The format the data is returned in is an array of objects that looks like this:
[
{
"_id": "agree",
"count": 14
},
{
"_id": "neutral",
"count": 12
},
{
"_id": "disagree",
"count": 16
}
]
However, I'd prefer the returned object to look like this:
{
"agree": 14,
"neutral": 12,
"disagree": 16,
}
Is there another way I could structure my query to achieve this more succinct result?
Reformatted the data prior to sending the JSON response, all is well now.

Aggregate with a Composite Key

I am trying to aggregate some data and group it by Time Intervals as well as maintaining a sub-category, if you will. I want to be able to chart this data out so that I will have multiple different Lines corresponding to each Office that was called. The X axis will be the Time Intervals and the Y axis would be the Average Ring Time.
My data looks like this:
Calls: [{
created: ISODate(xyxyx),
officeCalled: 'ABC Office',
answeredAt: ISODate(xyxyx)
},
{
created: ISODate(xyxyx),
officeCalled: 'Office 2',
answeredAt: ISODate(xyxyx)
},
{
created: ISODate(xyxyx),
officeCalled: 'Office 3',
answeredAt: ISODate(xyxyx)
}];
My goal is to get my calls grouped by Time Intervals (30 Minutes/1 Hour/1 Day) AND by the Office Called. So when my aggregate completes, I'm looking for data like this:
[{"_id":TimeInterval1,"calls":[{"office":"ABC Office","ringTime":30720},
{"office":"Office2","ringTime":3070}]},
{"_id":TimeInterval2,"calls":[{"office":"Office1","ringTime":1125},
{"office":"ABC Office","ringTime":15856}]}]
I have been poking around for the past few hours and I was able to aggregate my data, but I haven't figured out how to group it properly so that I have each time interval along with the office data. Here is my latest code:
Call.aggregate([
{$match: {
$and: [
{created: {$exists: 1}},
{answeredAt: {$exists: 1}}]}},
{$project: { created: 1,
officeCalled: 1,
answeredAt: 1,
timeToAns: {$subtract: ["$answeredAt", "$created"]}}},
{$group: {_id: {"day": {"$dayOfYear": "$created"},
"hour": {
"$subtract": [
{"$hour" : "$created"},
{"$mod": [ {"$hour": "$created"}, 2]}
]
},
"officeCalled": "$officeCalled"
},
avgRingTime: {$avg: '$timeToAns'},
total: {$sum: 1}}},
{"$group": {
"_id": "$_id.day",
"calls": {
"$push": {
"office": "$_id.officeCalled",
"ringTime": "$avgRingTime"
},
}
}},
{$sort: {_id: 1}}
]).exec(function(err, results) {
//My results look like this
[{"_id":118,"calls":[{"office":"ABC Office","ringTime":30720},
{"office":"Office 2","ringTime":31384.5},
{"office":"Office 3","ringTime":7686.066666666667},...];
});
This just doesn't quite get it...I get my data but it's broken down by Day only. Not my 2 hour time interval that I was shooting for. Let me know if I'm doing this all wrong, please --- I am VERY NEW to aggregation so your help is very much appreciated.
Thank you!!
All you really need to do is include the both parts of the _id value your want in the final group. No idea why you thought to only reference a single field.
Also "loose the $project" as it is just wasted cycles and processing, when you can just use directly in $group on the first try:
Call.aggregate(
[
{ "$match": {
"created": { "$exists": 1 },
"answeredAt": { "$exists": 1 }
}},
{ "$group": {
"_id": {
"day": {"$dayOfYear": "$created"},
"hour": {
"$subtract": [
{"$hour" : "$created"},
{"$mod": [ {"$hour": "$created"}, 2]}
]
},
"officeCalled": "$officeCalled"
},
"avgRingTime": {
"$avg": { "$subtract": [ "$answeredAt", "$created" ] }
},
"total": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"day": "$_id.day",
"hour": "$_id.hour"
},
"calls": {
"$push": {
"office": "$_id.officeCalled",
"ringTime": "$avgRingTime"
},
},
"total": { "$sum": "$total" }
}},
{ "$sort": { "_id": 1 } }
]
).exec(function(err, results) {
});
Also note the complete omission of $and. This is not needed as all MongoDB query arguments are already "AND" conditions anyway, unless specifically stated otherwise. Just stick to what is simple. It's meant to be simple.