MongoDB collect / aggregate time series into an array

MongoDB collect / aggregate time series into an array - mongodb

Following the examples I have two types of data in the same time series
db.weather.insertMany( [
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temp": 72
},//....
and..
db.weather.insertMany([
{
"metadata": {"sensorId": 5578, "type": "humidity" },
"timestamp": ISODate("2021-05018T00:00:001Z"),
"humpercent": 78
},//...
and I want to be able to serve simple requests by aggregating the data as:
{
sensorId: 5578,
humidityData: [78, 77, 75 ...],
tempData: [72, 72, 71...]
}
which seems like the obvious use case, but the
db.foo.aggregate([{$group: {_id: "$sensorId"}}])
function on sensorId only returns the ids with no other fields. am i missing a simple identity aggregation function or a way to collect into an array?

What you are looking for is the $addToSet Operator:
db.foo.aggregate([{
$group: {
_id: "$metadata.sensorId",
temp: {
$addToSet: "$temp"
},
humidity: {
$addToSet: "$humpercent"
}
}
}])
Note that the order of elements in the returned array is not specified.

If all you have is two categories, you can simply $push them:
db.collection.aggregate([
{$sort: {timestamp: 1}},
{$group: {
_id: {sensorId: "$metadata.sensorId"},
temp: {$push: "$temp"},
humidity: {$push: "$humpercent"}
}
}
])
See how it works on the playground example - small
But if you want the generic solution for multiple measurements you need something like:
db.collection.aggregate([
{$sort: {timestamp: 1}},
{$set: {m: "$$ROOT"}},
{$unset: ["m.metadata", "m.timestamp", "m._id"]},
{$set: {m: {$first: {$objectToArray: "$m"}}}},
{$group: {
_id: {type: "$metadata.type", sensorId: "$metadata.sensorId"},
data: {$push: "$m.v"}}
},
{$project: {_id: 0, data: 1, type: {k: "type", v: "$_id.type"}, sensorId: "$_id.sensorId"}},
{$group: {
_id: "$sensorId",
data: {$push: {k: "$type.v", v: "$data"}}
}},
{$project: {_id: 0, data: {"$mergeObjects": [{$arrayToObject: "$data"}, {sensorId: "$_id"}]}
}},
{$replaceRoot: {newRoot: "$data"}}
])
See how it works on the playground example - generic

Related

Filter nested objects

I have a collection of docs like
{'id':1, 'score': 1, created_at: ISODate(...)}
{'id':1, 'score': 2, created_at: ISODate(...)}
{'id':2, 'score': 1, created_at: ISODate(...)}
{'id':2, 'score': 20, created_at: ISODate(...)}
etc.
Does anyone know how to find docs that were created within the past 24hrs where the difference of the score value between the two most recent docs of the same id is less than 5?
So far I can only find all docs created within the past 24hrs:
[{
$project: {
_id: 0,
score: 1,
created_at: 1
}
}, {
$match: {
$expr: {
$gte: [
'$created_at',
{
$subtract: [
'$$NOW',
86400000
]
}
]
}
}
}]
Any advice appreciated.
Edit: By the two most recent docs, the oldest of the two can be created more than 24hrs ago. So the most recent doc would be created within the past 24hrs, but the oldest doc could be created over 24hrs ago.

If I understand you correctly, you want something like:
db.collection.aggregate([
{$match: {$expr: {$gte: ["$created_at", {$subtract: ["$$NOW", 86400000]}]}}},
{$sort: {created_at: -1}},
{$group: {_id: "$id", data: {$push: "$$ROOT"}}},
{$project: {pair: {$slice: ["$data", 0, 2]}, scores: {$slice: ["$data.score", 0, 2]}}},
{$match: {$expr: {
$lte: [{$abs: {$subtract: [{$first: "$scores"}, {$last: "$scores"}]}}, 5]
}}},
{$unset: "scores"}
])
See how it works on the playground example
EDIT:
according to you comment, one option is:
db.collection.aggregate([
{$setWindowFields: {
partitionBy: "$id",
sortBy: {created_at: -1},
output: {data: {$push: "$$ROOT", window: {documents: ["current", 1]}}}
}},
{$group: {
_id: "$id",
created_at: {$first: "$created_at"},
pair: {$first: "$data"}
}},
{$match: {$expr: {$and: [
{$gte: ["$created_at", {$dateAdd: {startDate: "$$NOW", unit: "day", amount: -1}},
{$eq: [{$size: "$pair"}, 2]},
{$lte: [{$abs: {$subtract: [{$first: "$pair.score"},
{$last: "$pair.score"}]}}, 5]}
]}}},
{$project: {_id: 0, pair: 1}}
])
See how it works on the playground example

If I've understood correctly you can try this query:
First the $match as you have to get documents since a day ago.
Then $sort by the date to ensure the most recent are on top.
$group by the id, and how the most recent were on top, using $push will be the two first elements in the array.
So now you only need to $sum these two values.
And filter again with these one that are less than ($lt) 5.
db.collection.aggregate([
{
$match: {
$expr: {
$gte: [
"$created_at",
{
$subtract: [
"$$NOW",
86400000
]
}
]
}
}
},
{
"$sort": {
"created_at": -1
}
},
{
"$group": {
"_id": "$id",
"score": {
"$push": "$score"
}
}
},
{
"$project": {
"score": {
"$sum": {
"$firstN": {
"n": 2,
"input": "$score"
}
}
}
}
},
{
"$match": {
"score": {
"$lt": 5
}
}
}
])
Example here
Edit: $firstN is new in version 5.2. Other way you can use $slice in this way.

Projecting data after doing a $facet

After doing a $facet I receive this output:
[
{
"confirmed": [
{
"confirmed": 100
}
],
"denied": [
{
"denied": 50
}
],
"pending": [
{
"pending": 20
}
]
}
]
how can I project it into something like this?
[
{
category: "confirmed", count: 100,
category: "denied", count: 50,
category: "pending", count: 20
}
]
I need the faucet part because to extract those numbers I have to do several $match to the same data. Dont know if there is a better option.
Thank you!

What you ask is not a valid format. This is an object with duplicate keys. You may want:
[{"confirmed": 100, "denied": 50, "pending": 20}]
or
[
{category: "confirmed", count: 100},
{category: "denied", count: 50},
{category: "pending", count: 20}
]
which are both valid options
I guess you want the second option. If you want the generic solution, one option is:
db.collection.aggregate([
{$project: {res: {$objectToArray: "$$ROOT"}}},
{$project: {
res: {$map: {
input: "$res",
in: {category: "$$this.k", count: {$objectToArray: {$first: "$$this.v"}}}
}}
}},
{$project: {
res: {$map: {
input: "$res",
in: {category: "$$this.category", count: {$first: "$$this.count.v"}}
}}
}},
{$unwind: "$res"},
{$replaceRoot: {newRoot: "$res"}}
])
See how it works on the playground example - generic
If you want the literal option, just use:
db.collection.aggregate([
{$project: {
res: [
{category: "confirmed", count: {$first: "$confirmed.confirmed"}},
{category: "denied", count: {$first: "$denied.denied"}},
{category: "pending", count: {$first: "$pending.pending"}}
]
}
},
{$unwind: "$res"},
{$replaceRoot: {newRoot: "$res"}}
])
See how it works on the playground example - literal

Mongo: Average on each position of a nested array for multiple documents

I'm recieving an array of documents, each document has the data of some participants of a study.
"a" has some anatomic metrics, here represented as "foo" and "bar". (i.e. height, weight, etc.)
"b" has the performance per second on other tests:
"t" is the time in seconds and
"e" are the tests results mesured at that specific time. (i.e. cardiac rithm, blood pressure, temperature, etc. )
Example of data:
[
{
"a": { "foo":1, "bar": 100 },
"b": [
{ "t":1, "e":[3,4,5] },
{ "t":2, "e":[4,4,4] },
{ "t":3, "e":[7,4,7] }
],
},
{
"a": { "foo":2, "bar": 111 },
"b": [
{ "t":1, "e":[9,4,0] },
{ "t":2, "e":[1,4,2] },
{ "t":3, "e":[3,4,5] }
],
},
{
"a": { "foo":4, "bar": 200 },
"b": [
{ "t":1, "e":[1,4,2] },
{ "t":2, "e":[3,1,3] },
{ "t":3, "e":[2,4,1] }
],
}
]
I'm trying to get some averages of the participants.
I already manage to get the averages of the anatomic values stored in "a".
I used:
db.collection.aggregate([
{
$group: {
_id: null,
barAvg: {
$avg: {
$avg: "$a.bar"
}
}
}
}
])
However, I'm failing to get the average of every test per second. So that would be the average on every "t" of every individual element of "e".
Expected result:
"average": [
{ "t":1, "e":[4.33, 3.00, 2.33] },
{ "t":2, "e":[2.66, 3.00, 3.00] },
{ "t":3, "e":[4.33, 3.00, 5.00] }
]
Here, 4.33 is the average of every first test ( e[0] ), but just of the fisrt second ( t=1 ), of every person.

One option is to $unwind to separate for the documents according to their t value and use $zip to transpose it before calculating the average:
db.collection.aggregate([
{$unwind: "$b"},
{$group: {_id: "$b.t", data: {$push: "$b.e"}}},
{$set: {data: {$zip: {inputs: [
{$arrayElemAt: ["$data", 0]},
{$arrayElemAt: ["$data", 1]},
{$arrayElemAt: ["$data", 2]}
]
}
}
}
},
{$project: {
t: "$_id",
e: {$map: {input: "$data", in: {$trunc: [{$avg: "$$this"}, 2]}}}
}
},
{$sort: {t: 1}},
{$group: {_id: 0, average: {$push: {t: "$t", e: "$e"}}}},
{$unset: "_id"}
])
See how it works on the playground example - zip
Other option may be to $unwind twice and build the entire calculation from pieces, but the advantage is that you don't need to literally specify the number of items in each e array for the $arrayElemAt:
db.collection.aggregate([
{$project: {b: 1, t: 1, _id: 0}},
{$unwind: "$b"},
{$unwind: {path: "$b.e", includeArrayIndex: "index"}},
{$group: {_id: {t: "$b.t", index: "$index"}, data: {$push: "$b.e"}}},
{$sort: {"_id.index": 1}},
{$group: {_id: "$_id.t", average: {$push: {$avg: "$data"}}}},
{$sort: {_id: 1}},
{$group: {_id: 0, average: {$push: {t: "$_id", e: "$average"}}}},
{$unset: "_id"}
])
See how it works on the playground example - unwind twice

MongoDB Aggregation SUM Array of Arrays by object key

Okay, so I've been searching for a while but couldn't find an answer to this, and I am desperate :P
I have some documents with this syntax
{
"period": ISODate("2018-05-29T22:00:00.000+0000"),
"totalHits": 13982
"hits": [
{
// some fields...
users: [
{
// some fields...
userId: 1,
products: [
{ productId: 1, price: 30 },
{ productId: 2, price: 30 },
{ productId: 3, price: 30 },
{ productId: 4, price: 30 },
]
},
]
}
]
}
And I want to retrieve a count of how many products (Independently of which user has them) we have on a period, an example output would be like this:
[
{
"period": ISODate("2018-05-27T22:00:00.000+0000"),
"count": 432
},
{
"period": ISODate("2018-05-28T22:00:00.000+0000"),
"count": 442
},
{
"period": ISODate("2018-05-29T22:00:00.000+0000"),
"count": 519
}
]
What is driving me crazy is the "object inside an array inside an array" I've done many aggregations but I think they were simpler than this one, so I am a bit lost.
I am thinking about changing our document structure to a better one, but we have ~6M documents which we would need to transform to the new one and that's just a mess... but Maybe it's the only solution.
We are using MongoDB 3.2, we can't update our systems atm (I wish, but not possible).

You can use $unwind to expand your array, then use $group to sum:
db.test.aggregate([
{$match: {}},
{$unwind: "$hits"},
{$project: {_id: "$_id", period: "$period", users: "$hits.users"}},
{$unwind: "$users"},
{$project: {_id: "$_id", period: "$period", subCout: {$size: "$users.products"}}},
{$group: {"_id": "$period", "count": {$sum: "$count"}}}
])

MongoDB - Unwind array using aggregation and remove duplicates

I am unwinding an array using MongoDB aggregation framework and the array has duplicates and I need to ignore those duplicates while doing a grouping further.
How can I achieve that?

you can use $addToSet to do this:
db.users.aggregate([
{ $unwind: '$data' },
{ $group: { _id: '$_id', data: { $addToSet: '$data' } } }
]);
It's hard to give you more specific answer without seeing your actual query.

You have to use $addToSet, but at first you have to group by _id, because if you don't you'll get an element per item in the list.
Imagine a collection posts with documents like this:
{
body: "Lorem Ipsum...",
tags: ["stuff", "lorem", "lorem"],
author: "Enrique Coslado"
}
Imagine you want to calculate the most usual tag per author. You'd make an aggregate query like that:
db.posts.aggregate([
{$project: {
author: "$author",
tags: "$tags",
post_id: "$_id"
}},
{$unwind: "$tags"},
{$group: {
_id: "$post_id",
author: {$first: "$author"},
tags: {$addToSet: "$tags"}
}},
{$unwind: "$tags"},
{$group: {
_id: {
author: "$author",
tags: "$tags"
},
count: {$sum: 1}
}}
])
That way you'll get documents like this:
{
_id: {
author: "Enrique Coslado",
tags: "lorem"
},
count: 1
}

Previous answers are correct, but the procedure of doing $unwind -> $group -> $unwind could be simplified.
You could use $addFields + $reduce to pass to the pipeline the filtered array which already contains unique entries and then $unwind only once.
Example document:
{
body: "Lorem Ipsum...",
tags: [{title: 'test1'}, {title: 'test2'}, {title: 'test1'}, ],
author: "First Last name"
}
Query:
db.posts.aggregate([
{$addFields: {
"uniqueTag": {
$reduce: {
input: "$tags",
initialValue: [],
in: {$setUnion: ["$$value", ["$$this.title"]]}
}
}
}},
{$unwind: "$uniqueTag"},
{$group: {
_id: {
author: "$author",
tags: "$uniqueTag"
},
count: {$sum: 1}
}}
])

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB collect / aggregate time series into an array - mongodb

What you are looking for is the $addToSet Operator: db.foo.aggregate([{ $group: { _id: "$metadata.sensorId", temp: { $addToSet: "$temp" }, humidity: { $addToSet: "$humpercent" } } }]) Note that the order of elements in the returned array is not specified.

Related

Filter nested objects

Projecting data after doing a $facet

Mongo: Average on each position of a nested array for multiple documents

MongoDB Aggregation SUM Array of Arrays by object key

MongoDB - Unwind array using aggregation and remove duplicates

Categories

Resources