I need returns the average hourly rate between two dates in Mongodb - mongodb

I need to write a query [ aggregate ] that returns the average hourly rate between two dates in Mongodb.
I found in my research the following code
db.transactions.aggregate([
{
$match: {
transactionDate: {
$gte: ISODate("2017-01-01T00:00:00.000Z"),
$lt: ISODate("2017-01-31T23:59:59.000Z")
}
}
}, {
$group: {
_id: null,
average_transaction_amount: {
$avg: "$amount"
}
}
}
]);
The previous code returns the average one value between two dates,but I need Average per hour.
sample document is
{"_id":"5a4e1fa1e4b02d76985c39b1",
"temp1":4,
"temp2":3,
"temp3‌​":2,
"created_on":"20‌​18-01-04T12:35:45.83‌​3Z"}
Result for example :
{ {temp1:5,temp2:6,temp3:6} for Average hour {temp1:2,temp2:4,temp3:6}for Average hour next ,{temp1:5,temp2:7,temp3:9},{temp1:8,temp2:4,temp3:7},{temp1:‌​4,temp2:2,temp3:6},{‌​temp1:9,temp2:6,temp‌​3:4}
}
Between every hour there are a lot of values ,So I need to calculate the Avg per hour
Please help

Related

How to write a single query to count elements above a certain value in MongoDB

I have the following sample collection of movies:
[
{
"title":"Boots and Saddles",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Wooden Leg",
"year":1909,
"cast":[],
"genres":[]
},
{
"title":"The Sanitarium",
"year":1910,
"cast":["Fatty Arbuckle"],
"genres":["Comedy"]
},
{
"title":"Snow White",
"year":1916,
"cast":["Marguerite Clark"],
"genres":["Fantasy"]
},
{
"title":"Haunted Spooks",
"year":1920,
"cast":["Harold Lloyd"],
"genres":["Comedy"]
},
{
"title":"Very Truly Yours",
"year":1922,
"cast":["Shirley Mason", "lan Forrest"],
"genres":["Romance"]
}
]
I want to count number of movies appeared in the last 20 years (from the last movie recorded in this collection).
I have following query to find which year is the most recent movie (result shows 2018):
db.movies.find({},{"_id":0, "year":1}).sort({year:-1}).limit(1)
So to find how many movies appeared in the last 20 years I wrote this:
db.movies.aggregate([{$match:{year:{$gte:1999}}},{$count:"title"}])
However, this is not very optimized, because if the database is modified or updated,I will have to modify that query every time.
Is there a more elegant way to find the result?
Thank you in advance!
You can use mongodb aggregate method.
db.movies.aggregate([
{ $sort: { year: -1 } },
{ $limit: 1 },
{
$project: {
currentYear: { $year: new Date() },
latestMovieYear: "$year",
last20Years: { $subtract: [ "$currentYear", 20 ] }
}
},
{
$match: {
year: { $gte: "$last20Years", $lte: "$latestMovieYear" }
}
},
{ $count: "movies" }
]);
Sort the documents by year in descending order, and limit the number of documents to 1. It will return latest movie present in the collection.
Use the $project operator to create a new field currentYear that returns the current year, latestMovieYear that returns the year of the latest movie, and last20Years that subtracts 20 from the current year.
Use $match operator to filter out the movies that have a year greater than or equal to last20Years and less than or equal to latestMovieYear.
Use the $count operator to count the number of documents that match the above criteria.

Mongodb selecting every nth of a given sorted aggregation

I want to be able to retrieve every nth item of a given collection which is quite large (millions of records)
Here is a sample of my collection
{
_id: ObjectId("614965487d5d1c55794ad324"),
hour: ISODate("2021-09-21T17:21:03.259Z"),
searches: [
ObjectId("614965487d5d1c55794ce670")
]
}
My start of aggregation is like so
[
{
$match: {
searches: {
$in: [ObjectId('614965487d5d1c55794ce670')],
},
},
},
{ $sort: { hour: -1 } },
{ $project: { hour: 1 } },
...
]
I have tried many things including
$sample which does not make the pick in the good order
Using $skip makes it very slow as the number given to skip grows
Using _id instead of $skip but my ids are unfortunately not created in an ordered manner
My goal is thus to retrieve the hour of a record, every 20000 record, so that I can then make a call to retrieve data by chunks of approximately 20000 records.
I imagine it would be possible to
sort, and number every records, then keep only the first, 20000, 40000, ..., and the last
Thanks for your help and let me know if you need more information

MONGODB return min value for a field if falls within last year otherwise return min value for collection

I have a collection that consists of documents in this format:
{"_id":{"date_played":{"$date":"1998-03-28T00:00:00.000Z"},"course_played":4,"player_id":11},"score":[5,6,4,4,5,9,6,6,5,7,6,6,5,7,5,3,9,4],"handicap":30,"cash_won":0,"sort_order":6,"gross_score":102,"gross_sfpts":34,"skins_group":1,"score_differential":28,"pcc_adjustment":0,"player_name":"Dave"}
By _id.player_id I am trying to return the min value of "score_differential" for records within the last year(_id.date_played), if no records in the last year then I want the min "score_differential" for the player in the collection.
I have tried lots of combinations but this is the closest I have got:-
Which returns the correct values but the problem is if a date is found within the year I get two records back, one with _id: false which has lowest in collection and one with _id: true which has lowest for year. My problem is that I only want one record back not two. Any help is much appreciated as I have spent days on this, relatively new to mongodb coming from mysql.
{
'$match': {
'_id.player_id': 11
}
}, {
'$group': {
'_id': {
'$min': [
{
'$gt': [
'$_id.date_played', {
'$dateFromParts': {
'year': {
'$subtract': [
{
'$year': new Date()
}, 1
]
},
'month': {
'$month': new Date()
},
'day': {
'$dayOfMonth': new Date()
}
}
}
]
}
]
},
'minWHI': {
'$min': '$score_differential'
}
}
}
] ```
Thx but I no longer need a solution to this, I actually use the above code in a $lookup stage in another query and as that returns an array I end up with just one document returned where I can test the matched field for the values I need, I was just confused as I was trying to test each part of the main query separately and in doing so getting two documents.

Calculate amount of minutes between multiple date ranges, but don't calculate the overlapping dates in MongoDB

I am creating a way to generate reports of the amount of time equipment was down for, during a given time frame. I will potentially have 100s to thousands of documents to work with. Every document will have a start date and end date, both in BSON format and will generally be within minutes of each other. For simplicity sake I am also zeroing out the seconds.
The actual aggregation I need to do, is I need to calculate the amount of minutes between each given date, but there may be other documents with overlapping dates. Any overlapping time should not be calculated if it's been calculated already. There are various other aggregations I'll need to do, but this is the only one that I'm unsure of, if it's even possible at all.
{
"StartTime": "2020-07-07T18:10:00.000Z",
"StopTime": "2020-07-07T18:13:00.000Z",
"TotalMinutesDown": 3,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
{
"StartTime": "2020-07-07T18:12:00.000Z",
"StopTime": "2020-07-07T18:14:00.000Z",
"TotalMinutesDown": 2,
"CreatedAt": "2020-07-07T18:13:57.675Z"
}
The two documents above are examples of what I'm working with. Every document gets the total amount of minutes between the two dates stored in the document (This field serves another purpose, unrelated). If I were to aggregate this to get total minutes down, the output of total minutes should be 4, as I'm not wanting to calculate the overlapping minutes.
Finding overlap of time ranges sounds to me a bit abstract. Let's try to convert it to a concept that databases are usually used for: discrete values.
If we convert the times to discrete value, we will be able to find the duplicate values, i.e. the "overlapping values" and eliminate them.
I'll illustrate the steps using your sample data. Since you have zeroed out the seconds, for simplicity sake, we can start from there.
Since we care about minute increments we are going to convert times to "minutes" elapsed since the Unix epoch.
{
"StartMinutes": 26569090,
"StopMinutes": 26569092,
}
{
"StartMinutes": 26569092,
"StopMinutes": 26569092
}
We convert them to discrete values
{
"minutes": [26569090, 26569091, 26569092]
}
{
"minutes": [26569092, 26569093]
}
Then we can do a set union on all the arrays
{
"allMinutes": [26569090, 26569091, 26569092, 26569093]
}
This is how we can get to the solution using aggregation. I have simplified the queries and grouped some operations together
db.collection.aggregate({
$project: {
minutes: {
$range: [
{
$divide: [{ $toLong: "$StartTime" }, 60000] // convert to minutes timestamp
},
{
$divide: [{ $toLong: "$StopTime" }, 60000]
}
]
},
}
},
{
$group: { // combine to one document
_id: null,
_temp: { $push: "$minutes" }
}
},
{
$project: {
totalMinutes: {
$size: { // get the size of the union set
$reduce: {
input: "$_temp",
initialValue: [],
in: {
$setUnion: ["$$value", "$$this"] // combine the values using set union
}
}
}
}
}
})
Mongo Playground

MongoDB query: how to select the longest period of time of a matched value

I have a mongo database with many records in the format of:
{
id: {
$_id
},
date: {
$date: YYYY-MM-DDThh:mm:ssZ
},
reading: X.XX
}
where the date is a timestamp in mongo and reading is a float (id is just the unique identifier for the data point) .
I would like to be able to count the longest period of time when the reading was a certain value (lets say 0.00 for ease of use) and return the start and end points of this time period. If there were more than one time period of the same length I would like them all returned.
Ultimately, for example, I would like to be able to say
"The longest time period the reading is 0.00 and 1.25 hours
between
2000-01-01T00:00:00 - 2000-01-01T01:15:00,
2000-06-01T02:00:00 - 2000-06-01T03:15:00,
2000-11-11T20:00:00 - 2000-11-11T21:15:00 ."
For my mongo aggregation query I am thinking of doing this:
get the timeframe I am interested in (eg 2000-01-01 to
2001-01-01)
sort the data by date descending
somehow select the longest run when the reading is 0.00.
This is the query I have so far:
[
{
$match: {
date: { $gte: ISODate("2000-01-01T00:00:00.0Z"), $lt: ISODate("2001-01-01T00:00:00.0Z") }
}
},
{ "$sort": { "date": -1 } },
{
"$group" : {
"_id": null,
"Maximum": { "$max": { "max": "$reading", "date": "$date" } },
"Longest": { XXX: { start_dates: [], end_dates: [] } }
}
},
{
"$project": {
"_id": 0,
"max": "$Maximum",
"longest": "$Longest"
}
}
]
I do not know how to select the longest run. How would you do this?
(You will notice I am also interested in the maximum reading within the time period and the dates on which that maximum reading happens. At the moment I am only recording the latest date/time this occurs but would like it to record all the dates/times the maximum value occurs on eventually.)