MongoDB query: how to select the longest period of time of a matched value - mongodb

I have a mongo database with many records in the format of:
{
id: {
$_id
},
date: {
$date: YYYY-MM-DDThh:mm:ssZ
},
reading: X.XX
}
where the date is a timestamp in mongo and reading is a float (id is just the unique identifier for the data point) .
I would like to be able to count the longest period of time when the reading was a certain value (lets say 0.00 for ease of use) and return the start and end points of this time period. If there were more than one time period of the same length I would like them all returned.
Ultimately, for example, I would like to be able to say
"The longest time period the reading is 0.00 and 1.25 hours
between
2000-01-01T00:00:00 - 2000-01-01T01:15:00,
2000-06-01T02:00:00 - 2000-06-01T03:15:00,
2000-11-11T20:00:00 - 2000-11-11T21:15:00 ."
For my mongo aggregation query I am thinking of doing this:
get the timeframe I am interested in (eg 2000-01-01 to
2001-01-01)
sort the data by date descending
somehow select the longest run when the reading is 0.00.
This is the query I have so far:
[
{
$match: {
date: { $gte: ISODate("2000-01-01T00:00:00.0Z"), $lt: ISODate("2001-01-01T00:00:00.0Z") }
}
},
{ "$sort": { "date": -1 } },
{
"$group" : {
"_id": null,
"Maximum": { "$max": { "max": "$reading", "date": "$date" } },
"Longest": { XXX: { start_dates: [], end_dates: [] } }
}
},
{
"$project": {
"_id": 0,
"max": "$Maximum",
"longest": "$Longest"
}
}
]
I do not know how to select the longest run. How would you do this?
(You will notice I am also interested in the maximum reading within the time period and the dates on which that maximum reading happens. At the moment I am only recording the latest date/time this occurs but would like it to record all the dates/times the maximum value occurs on eventually.)

Related

How to write a single query to count elements above a certain value in MongoDB

I have the following sample collection of movies:
[
{
"title":"Boots and Saddles",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Wooden Leg",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Sanitarium",
"year":1910,
"cast":["Fatty Arbuckle"],
"genres":["Comedy"]
},
{
"title":"Snow White",
"year":1916,
"cast":["Marguerite Clark"],
"genres":["Fantasy"]
},
{
"title":"Haunted Spooks",
"year":1920,
"cast":["Harold Lloyd"],
"genres":["Comedy"]
},
{
"title":"Very Truly Yours",
"year":1922,
"cast":["Shirley Mason", "lan Forrest"],
"genres":["Romance"]
}
]
I want to count number of movies appeared in the last 20 years (from the last movie recorded in this collection).
I have following query to find which year is the most recent movie (result shows 2018):
db.movies.find({},{"_id":0, "year":1}).sort({year:-1}).limit(1)
So to find how many movies appeared in the last 20 years I wrote this:
db.movies.aggregate([{$match:{year:{$gte:1999}}},{$count:"title"}])
However, this is not very optimized, because if the database is modified or updated,I will have to modify that query every time.
Is there a more elegant way to find the result?
Thank you in advance!
You can use mongodb aggregate method.
db.movies.aggregate([
{ $sort: { year: -1 } },
{ $limit: 1 },
{
$project: {
currentYear: { $year: new Date() },
latestMovieYear: "$year",
last20Years: { $subtract: [ "$currentYear", 20 ] }
}
},
{
$match: {
year: { $gte: "$last20Years", $lte: "$latestMovieYear" }
}
},
{ $count: "movies" }
]);
Sort the documents by year in descending order, and limit the number of documents to 1. It will return latest movie present in the collection.
Use the $project operator to create a new field currentYear that returns the current year, latestMovieYear that returns the year of the latest movie, and last20Years that subtracts 20 from the current year.
Use $match operator to filter out the movies that have a year greater than or equal to last20Years and less than or equal to latestMovieYear.
Use the $count operator to count the number of documents that match the above criteria.

Calculate amount of minutes between multiple date ranges, but don't calculate the overlapping dates in MongoDB

I am creating a way to generate reports of the amount of time equipment was down for, during a given time frame. I will potentially have 100s to thousands of documents to work with. Every document will have a start date and end date, both in BSON format and will generally be within minutes of each other. For simplicity sake I am also zeroing out the seconds.
The actual aggregation I need to do, is I need to calculate the amount of minutes between each given date, but there may be other documents with overlapping dates. Any overlapping time should not be calculated if it's been calculated already. There are various other aggregations I'll need to do, but this is the only one that I'm unsure of, if it's even possible at all.
{
"StartTime": "2020-07-07T18:10:00.000Z",
"StopTime": "2020-07-07T18:13:00.000Z",
"TotalMinutesDown": 3,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
{
"StartTime": "2020-07-07T18:12:00.000Z",
"StopTime": "2020-07-07T18:14:00.000Z",
"TotalMinutesDown": 2,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
The two documents above are examples of what I'm working with. Every document gets the total amount of minutes between the two dates stored in the document (This field serves another purpose, unrelated). If I were to aggregate this to get total minutes down, the output of total minutes should be 4, as I'm not wanting to calculate the overlapping minutes.
Finding overlap of time ranges sounds to me a bit abstract. Let's try to convert it to a concept that databases are usually used for: discrete values.
If we convert the times to discrete value, we will be able to find the duplicate values, i.e. the "overlapping values" and eliminate them.
I'll illustrate the steps using your sample data. Since you have zeroed out the seconds, for simplicity sake, we can start from there.
Since we care about minute increments we are going to convert times to "minutes" elapsed since the Unix epoch.
{
"StartMinutes": 26569090,
"StopMinutes": 26569092,
}
{
"StartMinutes": 26569092,
"StopMinutes": 26569092
}
We convert them to discrete values
{
"minutes": [26569090, 26569091, 26569092]
}
{
"minutes": [26569092, 26569093]
}
Then we can do a set union on all the arrays
{
"allMinutes": [26569090, 26569091, 26569092, 26569093]
}
This is how we can get to the solution using aggregation. I have simplified the queries and grouped some operations together
db.collection.aggregate({
$project: {
minutes: {
$range: [
{
$divide: [{ $toLong: "$StartTime" }, 60000] // convert to minutes timestamp
},
{
$divide: [{ $toLong: "$StopTime" }, 60000]
}
]
},
}
},
{
$group: { // combine to one document
_id: null,
_temp: { $push: "$minutes" }
}
},
{
$project: {
totalMinutes: {
$size: { // get the size of the union set
$reduce: {
input: "$_temp",
initialValue: [],
in: {
$setUnion: ["$$value", "$$this"] // combine the values using set union
}
}
}
}
}
})
Mongo Playground

I need returns the average hourly rate between two dates in Mongodb

I need to write a query [ aggregate ] that returns the average hourly rate between two dates in Mongodb.
I found in my research the following code
db.transactions.aggregate([
{
$match: {
transactionDate: {
$gte: ISODate("2017-01-01T00:00:00.000Z"),
$lt: ISODate("2017-01-31T23:59:59.000Z")
}
}
}, {
$group: {
_id: null,
average_transaction_amount: {
$avg: "$amount"
}
}
}
]);
The previous code returns the average one value between two dates,but I need Average per hour.
sample document is
{"_id":"5a4e1fa1e4b02d76985c39b1",
"temp1":4,
"temp2":3,
"temp3‌​":2,
"created_on":"20‌​18-01-04T12:35:45.83‌​3Z"}
Result for example :
{ {temp1:5,temp2:6,temp3:6} for Average hour {temp1:2,temp2:4,temp3:6}for Average hour next ,{temp1:5,temp2:7,temp3:9},{temp1:8,temp2:4,temp3:7},{temp1:‌​4,temp2:2,temp3:6},{‌​temp1:9,temp2:6,temp‌​3:4}
}
Between every hour there are a lot of values ,So I need to calculate the Avg per hour
Please help

Use Elasticsearch to find entries with date around a specific date

How can I use elasticsearch to search for people which are around a specific age?
So if I enter 28 as age, I want the people who are 28 to have the highest score, but also want people who are 27 to be shown, but with a lower score.
The birthdate is stored in the following format yyyy-mm-dd, so I will have to convert age to date, but this is no problem.
I have the following so far:
{
"query": {
"fuzzy": {
"birthdate": {
"value": "1985-10-01",
"min_similarity": "1096d"
}
}
}
}
The min_similarity of 1096d means that people who are born on the 1st of October 1985 +/- 3 years.
So all people who are born between 1982 and 1988 are shown - this works great, but they all have the same score of 1.0. How can I get the highest score for the entry with the birthdate nearest to 1985-10-01 ?
You can calculate a custom score using a script. This script uses SimpleDateFormat to parse your date (1985-10-01), then calculates the absolute value of that date (in ms) minus the document's date (in ms). You want the lowest value (closest to target date) first, so sort by score ascending instead of the default descending.
{
"query": {
"custom_score": {
"query": {
"fuzzy": {
"birthdate": {
"value": "1985-10-01",
"min_similarity": "1096d"
}
}
},
"script": "abs(new \
java.text.SimpleDateFormat('yyyy-MM-dd').parse('1985-10-01').getTime() - \
doc['birthdate'].date.getMillis())"
}
},
"sort": [
{ "_score": "asc" }
]
}
More info on custom scoring is http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query/
It can be done as well with Decay functions instead of a custom script:
https://stackoverflow.com/a/33347741/803174
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-decay
"gauss": {
"date": {
"origin": "2013-09-17",
"scale": "10d",
}
}

MongoDB - Querying between a time range of hours

I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?
Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.
Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.