Calculate amount of minutes between multiple date ranges, but don't calculate the overlapping dates in MongoDB - mongodb

I am creating a way to generate reports of the amount of time equipment was down for, during a given time frame. I will potentially have 100s to thousands of documents to work with. Every document will have a start date and end date, both in BSON format and will generally be within minutes of each other. For simplicity sake I am also zeroing out the seconds.
The actual aggregation I need to do, is I need to calculate the amount of minutes between each given date, but there may be other documents with overlapping dates. Any overlapping time should not be calculated if it's been calculated already. There are various other aggregations I'll need to do, but this is the only one that I'm unsure of, if it's even possible at all.
{
"StartTime": "2020-07-07T18:10:00.000Z",
"StopTime": "2020-07-07T18:13:00.000Z",
"TotalMinutesDown": 3,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
{
"StartTime": "2020-07-07T18:12:00.000Z",
"StopTime": "2020-07-07T18:14:00.000Z",
"TotalMinutesDown": 2,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
The two documents above are examples of what I'm working with. Every document gets the total amount of minutes between the two dates stored in the document (This field serves another purpose, unrelated). If I were to aggregate this to get total minutes down, the output of total minutes should be 4, as I'm not wanting to calculate the overlapping minutes.

Finding overlap of time ranges sounds to me a bit abstract. Let's try to convert it to a concept that databases are usually used for: discrete values.
If we convert the times to discrete value, we will be able to find the duplicate values, i.e. the "overlapping values" and eliminate them.
I'll illustrate the steps using your sample data. Since you have zeroed out the seconds, for simplicity sake, we can start from there.
Since we care about minute increments we are going to convert times to "minutes" elapsed since the Unix epoch.
{
"StartMinutes": 26569090,
"StopMinutes": 26569092,
}
{
"StartMinutes": 26569092,
"StopMinutes": 26569092
}
We convert them to discrete values
{
"minutes": [26569090, 26569091, 26569092]
}
{
"minutes": [26569092, 26569093]
}
Then we can do a set union on all the arrays
{
"allMinutes": [26569090, 26569091, 26569092, 26569093]
}
This is how we can get to the solution using aggregation. I have simplified the queries and grouped some operations together
db.collection.aggregate({
$project: {
minutes: {
$range: [
{
$divide: [{ $toLong: "$StartTime" }, 60000] // convert to minutes timestamp
},
{
$divide: [{ $toLong: "$StopTime" }, 60000]
}
]
},
}
},
{
$group: { // combine to one document
_id: null,
_temp: { $push: "$minutes" }
}
},
{
$project: {
totalMinutes: {
$size: { // get the size of the union set
$reduce: {
input: "$_temp",
initialValue: [],
in: {
$setUnion: ["$$value", "$$this"] // combine the values using set union
}
}
}
}
}
})
Mongo Playground

Related

MongoDB Aggregation to get events in timespan, plus the previous event

I have timeseries data as events coming in at random times. They are not ongoing metrics, but rather events. "This device went online." "This device went offline."
I need to report on the number of actual transitions within a time range. Because there are occasionally same-state events, for example two "went online" events in a row, I need to "seed" the data with the state previous to the time range. If I have events in my time range, I need to compare them to the state before the time range in order to determine if something actually changed.
I already have aggregation stages that remove same-state events.
Is there a way to add "the latest, previous event" to the data in the pipeline without writing two queries? A $facet stage totally ruins performance.
For "previous", I'm currently trying something like this in a separate query, but it's very slow on the millions of records:
// Get the latest event before a given date
db.devicemetrics.aggregate([
{
$match: {
'device.someMetadata': '70b28808-da2b-4623-ad83-6cba3b20b774',
time: {
$lt: ISODate('2023-01-18T07:00:00.000Z'),
},
someValue: { $ne: null },
},
},
{
$group: {
_id: '$device._id',
lastEvent: { $last: '$$ROOT' },
},
},
{
$replaceRoot: { newRoot: '$lastEvent' },
}
]);
You are looking for something akin to LAG window function in SQL. Mongo has $setWindowFields for this, combined with $shift Order operator.
Not sure about fields in your collection, but this should give you an idea.
{
$setWindowFields: {
partitionBy: "$device._id", //1. partition the data based on $device._id
sortBy: { time: 1 }, //2. within each partition, sort based on $time
output: {
"shiftedEvent": { //3. add a new field shiftedEvent to each document
$shift: {
output: "$event", //4. whose value is previous $event
by: -1
}
}
}
}
}
Then, you can compare the event and shiftedEvent fields.

MongoDB Aggregation question using summations / matches

I have a collection with the following type of documents:
{
device: integer,
date: string,
time: string,
voltage: double,
amperage: double
}
Data is inserted as time series data, and a separate process aggregates and averages results so that this collection has a single document per device every 5 minutes. ie. time is 00:05:00, 00:10:00, etc.
I need to search for a specific group of devices (usually 5-10 at a time). I need the voltage to be >= 27.0, and I need to search for a single date.
That part is easy, but I need to only find data when all 5-10 systems at a time interval meet the 27.0 requirement. I'm not sure how to handle that requirement.
Once I know that, I then need to find the specific grouping of devices that have the lowest summation of the amperage field, and I need to return the time that this occurred.
So, lets assume I am going to search for 5 devices. I need to find the time when all 5 devices have a voltage >= 27.0 and the summation of the amperage field is the lowest.
I'm not sure how to require that all the devices meet the voltage requirement, and then for that group of devices, to then find the time when the amperage summation is the lowest.
Any questions would be great.
Thanks.
You need to use $all operator.
Note: Provide please more information about "the summation of the amperage field is the lowest"
db.collection.aggregate([
{
$match: {
device: { $in: [1, 2, 3] },
date: "2022/10/01",
voltage: { $gte: 27.0 }
}
},
{
$group: {
_id: "$time",
device: {
"$addToSet": "$device"
},
amperage: {
$min: "$amperage"
},
root: {
$push: "$$ROOT"
}
}
},
{
$match: {
device: { $all: [ 1, 2, 3 ] }
}
}
])
MongoPlayground

Mongodb selecting every nth of a given sorted aggregation

I want to be able to retrieve every nth item of a given collection which is quite large (millions of records)
Here is a sample of my collection
{
_id: ObjectId("614965487d5d1c55794ad324"),
hour: ISODate("2021-09-21T17:21:03.259Z"),
searches: [
ObjectId("614965487d5d1c55794ce670")
]
}
My start of aggregation is like so
[
{
$match: {
searches: {
$in: [ObjectId('614965487d5d1c55794ce670')],
},
},
},
{ $sort: { hour: -1 } },
{ $project: { hour: 1 } },
...
]
I have tried many things including
$sample which does not make the pick in the good order
Using $skip makes it very slow as the number given to skip grows
Using _id instead of $skip but my ids are unfortunately not created in an ordered manner
My goal is thus to retrieve the hour of a record, every 20000 record, so that I can then make a call to retrieve data by chunks of approximately 20000 records.
I imagine it would be possible to
sort, and number every records, then keep only the first, 20000, 40000, ..., and the last
Thanks for your help and let me know if you need more information

I need returns the average hourly rate between two dates in Mongodb

I need to write a query [ aggregate ] that returns the average hourly rate between two dates in Mongodb.
I found in my research the following code
db.transactions.aggregate([
{
$match: {
transactionDate: {
$gte: ISODate("2017-01-01T00:00:00.000Z"),
$lt: ISODate("2017-01-31T23:59:59.000Z")
}
}
}, {
$group: {
_id: null,
average_transaction_amount: {
$avg: "$amount"
}
}
}
]);
The previous code returns the average one value between two dates,but I need Average per hour.
sample document is
{"_id":"5a4e1fa1e4b02d76985c39b1",
"temp1":4,
"temp2":3,
"temp3‌​":2,
"created_on":"20‌​18-01-04T12:35:45.83‌​3Z"}
Result for example :
{ {temp1:5,temp2:6,temp3:6} for Average hour {temp1:2,temp2:4,temp3:6}for Average hour next ,{temp1:5,temp2:7,temp3:9},{temp1:8,temp2:4,temp3:7},{temp1:‌​4,temp2:2,temp3:6},{‌​temp1:9,temp2:6,temp‌​3:4}
}
Between every hour there are a lot of values ,So I need to calculate the Avg per hour
Please help

MongoDB query: how to select the longest period of time of a matched value

I have a mongo database with many records in the format of:
{
id: {
$_id
},
date: {
$date: YYYY-MM-DDThh:mm:ssZ
},
reading: X.XX
}
where the date is a timestamp in mongo and reading is a float (id is just the unique identifier for the data point) .
I would like to be able to count the longest period of time when the reading was a certain value (lets say 0.00 for ease of use) and return the start and end points of this time period. If there were more than one time period of the same length I would like them all returned.
Ultimately, for example, I would like to be able to say
"The longest time period the reading is 0.00 and 1.25 hours
between
2000-01-01T00:00:00 - 2000-01-01T01:15:00,
2000-06-01T02:00:00 - 2000-06-01T03:15:00,
2000-11-11T20:00:00 - 2000-11-11T21:15:00 ."
For my mongo aggregation query I am thinking of doing this:
get the timeframe I am interested in (eg 2000-01-01 to
2001-01-01)
sort the data by date descending
somehow select the longest run when the reading is 0.00.
This is the query I have so far:
[
{
$match: {
date: { $gte: ISODate("2000-01-01T00:00:00.0Z"), $lt: ISODate("2001-01-01T00:00:00.0Z") }
}
},
{ "$sort": { "date": -1 } },
{
"$group" : {
"_id": null,
"Maximum": { "$max": { "max": "$reading", "date": "$date" } },
"Longest": { XXX: { start_dates: [], end_dates: [] } }
}
},
{
"$project": {
"_id": 0,
"max": "$Maximum",
"longest": "$Longest"
}
}
]
I do not know how to select the longest run. How would you do this?
(You will notice I am also interested in the maximum reading within the time period and the dates on which that maximum reading happens. At the moment I am only recording the latest date/time this occurs but would like it to record all the dates/times the maximum value occurs on eventually.)