Need to aggregate by hour and $avg not recognized - mongodb

From a MongoDB collection storing data with time stamps I need to return a single record for each hour.
So far I have selected the set of records between two dates successfully, but I cant figure how to build the hourly record I need in the $group clause.
var myName = "CollectionName"
//schema for mongoose
var mySchema = new Schema({
dt: Date,
value: Number
});
var myDB = mongoose.createConnection('mongodb://localhost:27017/MYDB');
myDBObj = myDB.model(myName, evalSchema, myName);
The match in this aggregate call works fine, and the $hour creates a record for each hour in the day.. but I don't know how to recreate the a full date and get an error "unknown group operator $avg" ...
myDBObj.aggregate([
{
$match: { "dt": { $gt: new Date("October 13, 2010 12:00:00"), $lt: new Date("November 13, 2010 12:00:00") } }
},{
$group: {
"_id": { "dt": { "$hour": "$dt" } , "price": { "$avg": "$price" }}
}], function (err, data) { if (err) { return next(err); } res.json(data); });
I think I need to use $dayOfYear so there is different records for each hour of each day, and include a new Date() somewhere ...
Can someone help me do this correctly? any help is appreciated.

The $group pipeline stage works by "grouping" all data by the "key" specified for _id. Other fields you are actually aggregating are separate from the _id value and are their own field properties.
So your $group becomes this instead:
{ "$group": {
"_id": { "$hour": "$dt" },
"price": { "$avg": "$price" }
}}
Or if you want that broken by day then make a compound key:
{ "$group": {
"_id": {
"day": { "$dayOfYear": "$dt" },
"hour": { "$hour": "$dt" }
},
"price": { "$avg": "$price" }
}}
Or just use date math to produce Date objects rounded by hour:
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$dt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$dt", new Date(0) ] },
1000 * 60 *60
]}
]},
new Date(0)
]
},
"price": { "$avg": "$price" }
}}
Where subrtacting another date object (epoch date) from another prodces a numeric value you can round ( 1000 milliseconds, 60 seconds, 60 minutes = 1 hour ) with the applied math, and adding a number to a date object produces a date corresponding to that value.
So your problem was you had everything in the _id, where the $avg accumulator is not recognised. All accumulators need to be specified outside of the grouping key. That is the intent.
If you want to make an accumulator value part of a grouping key ( does not seem relevant here though ), you instead follow with another group stage, referencing the field that was produced from the former.

Related

$matching with field added with $dateToString doesn't work

In my MongoDB collection I have documents that contain a nested string field, containing a month and year, e.g. '04/2021'. Sample document:
{
"_id": {
"$oid": "608ba45cec43c5b24cda034b"
},
"status": "pass",
"stage": 5,
"priority": 0,
"payload": {
"company_id": "8800",
"company_name": "<MY COMPANY>",
"target_period": "04/2021"
},
"retry_count": 0,
"build_number": "101",
"job_name": "P123",
"createdAt": {
"$date": "2021-04-30T06:31:56.000Z"
},
"updatedAt": {
"$date": "2021-05-10T03:55:44.686Z"
}
}
I am trying to write an aggregation pipeline that will dynamically return documents where said field points to the past month. For example, ran this month (May 2021) I would get documents labeled with '04/2021'. From this post I found the oneliner for getting the comparison string: new Date(new Date().getFullYear(), new Date().getMonth(), 1). (I understand that by the virtue of getMonth returning a zero-based index of month, getting the previous month works by accident and has to be solved somehow.)
This pipeline does not work:
[
{
$addFields: {
previous_month: {
$dateToString: {
'date': new Date(new Date().getFullYear(), new Date().getMonth(), 1),
'format': '%m/%G'
}
}
}
},
{
$match: {
"payload.target_period": "$previous_month"
}
}
]
With MongoDB Compass I can see that the field previous_month is populated just fine by the $addFields stage (above sample document gets value 04/2021), but the $match stage returns 0 documents. I'm running MongoDB version 4.2.12.
You should use $expr operator while trying to self reference another ket in a document inside
$match stage.
[
{
$addFields: {
previous_month: {
$dateToString: {
'date': new Date(new Date().getFullYear(), new Date().getMonth(), 1),
'format': '%m/%G'
}
}
}
},
{
$match: {
$expr: {
$eq: [ "$payload.target_period", "$previous_month" ],
},
}
}
]
Instead of doing whole process in query, I think you can prepare input date in your client language, (js, nodejs) easily,
have prepared a function zeroFill it will return number with concat 0 if its less than 10,
get previous month date and pick previous month
concat both month and year
function zeroFill(i) { return (i < 10 ? '0' : '') + i; }
var date = new Date();
date.setMonth(date.getMonth() - 1);
let searchDate = zeroFill(date.getMonth() + 1) + "/" + date.getFullYear();
console.log(searchDate); // mm/yyyy
Your query would be just:
[{ $match: { "payload.target_period": searchDate } }]
Playground
I would suggest moment.js library, it is much simpler to use:
{ $match: { "payload.target_period": moment().startOf("months").subtract(1, "months").format("MM/YYYY") } }

mongodb find between dates (month, year)

I have a collection with two fields, similar to the one bellow:
{
year: 2017,
month: 04 }
How can i select documents between 2017/07 - 2018/04?
Solved with:
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$addFields: {
"date": {
"$dateFromParts": {
"year": "$year",
"month": {"$toInt": "$month"}
}
}
}
},
// Stage 2
{
$match: {
"date": {
"$gte": ISODate("2017-07-01T00:00:00.000Z"),
"$lte": ISODate("2018-04-30T00:00:00.000Z")
}
}
},
]
);
Firstly you have to make sure you db is storing date in ISO format ( a format that mongo supports )
You can use following command to find documents :-
model.find({
date:{
$gte:ISODate("2017-04-29T00:00:00.000Z"),
$lte:ISODate("2017-07-29T00:00:00.000Z"),
}
})
Where model is the name of the collection and date is a attribute of
document holding dates in ISO format.

In Mongo, How to write search query to search document based on time, on Date object.?

We have Collection named Incident. In which we have one field StartTime(Date object type).
Every day, whenever incident condition is met then new Document entry will be created and inserted into the collection.
We have to get all the incident which, fall between 10PM to 6AM. (i.e from midnight to early morning).
But i face problem on how to write query for this use case.
Since we have date object, I can able to write query to search document between two Dates.
How to write search query for search based on time, on Date object.
Sample Data:
"StartTime" : ISODate("2015-10-16T18:15:14.211Z")
It's just not a good idea. But basically you apply the date aggregation operators:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$gte": [{ "$hour": "$StartTime" }, 22] },
{ "$lt": [{ "$hour": "$StartTime" }, 6 ] }
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Using $redact that will only return or $$KEEP the documents that meet both conditions for the $hour extracted from the Date, and $$PRUNE or "remove" from results those that do not.
A bit shorter with MongoDB 3.6 and onwards, but really no different:
db.collection.find({
"$expr": {
"$or": [
{ "$gte": [{ "$hour": "$StartTime" }, 22] },
{ "$lt": [{ "$hour": "$StartTime" }, 6 ] }
]
}
})
Overall, not a good idea because the statement needs to scan the whole collection and calculate that logical condition.
A better way is to actually "store" the "time" as a separate field:
var ops = [];
db.collection.find().forEach(doc => {
// Get milliseconds from start of day
let timeMillis = doc.StartTime.valueOf() % (1000 * 60 * 60 * 24);
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { timeMillis } }
}
});
if ( ops.length > 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
})
if ( ops.length > 0 ) {
db.collection.bulkWrite(ops);
ops = [];
}
Then you can simply query with something like:
var start = 22 * ( 1000 * 60 * 60 ), // 10PM
end = 6 * ( 1000 * 60 * 60 ); // 6AM
db.collection.find({
"$or": [
{ "timeMillis": { "$gte": start } },
{ "timeMillis": { "$lt": end } }
]
);
And that field can actually be indexed and so quickly and efficiently return results.

Mongo $subtract date doesn't work in aggregation $match block

I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.
This query doesn't work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: {
$subtract: [new Date(), 24 * 60 * 60 * 1000]
}
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
However, this second query work:
db.coll.aggregate(
[
{
$match: {
timestamp: {
$gte: new Date(new Date() - 24 * 60 * 60 * 1000)
}
}
},
{
$group: {
_id: {
timestamp: "$timestamp"
},
total: {
$sum: 1
}
}
},
{
$project: {
_id: 0,
timestamp: "$_id.timestamp",
total: "$total",
}
},
{
$sort: {
timestamp: -1
}
}
]
)
I need to use $subtract on my $match block so I can't use the last query.
As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/
I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:
[
{ $match:
{ $expr:
{ $gt: [
"$_created_at",
{ $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ]
}
}
}
]
Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.
Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.
A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.
As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.
The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.
If you need to work with the value of another field in the document then you work it out with $project first, as in:
db.collection.aggregate([
{ "$project": {
"fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
}},
{ "$match": { "fieldMath": { "$gt": 2 } }}
])
For any other purpose you really want to evaluate the value "outside" the pipeline.
The above answers the question you asked, but this answers the question you didn't ask.
Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.
It appears like you are looking for the math to group by "day", which you can do like this:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]
},
"total": { "$sum": "$total" }
}}
])
That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.
Or you can use the "date aggregation operators" to do much the same thing with a composite key.
So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.
The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:
(not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
(recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.

Find last record of each day

I store data about my power consumption, each minute there is a new record, here is an example:
{"date":1393156826114,"id":"5309d4cae4b0fbd904cc00e1","adco":"O","hchc":7267599,"hchp":10805900,"hhphc":"g","ptec":"c","iinst":13,"papp":3010,"imax":58,"optarif":"s","isousc":60,"motdetat":"Á"}
such that I have around 1440 records a day.
How can I get the last record of each day?
Note: I use mongodb in spring java, so I need a query like this:
Example to get all measures :
#Query("{ 'date' : { $gt : ?0 }}")
public List<Mesure> findByDateGreaterThan(Date date, Sort sort);
A bit more modern than the original answer:
db.collection.aggregate([
{ "$sort": { "date": 1 } },
{ "$group": {
"_id": {
"$subtract": ["$date",{"$mod": ["$date",86400000]}]
},
"doc": { "$last": "$$ROOT" }
}},
{ "$replaceRoot": { "newDocument": "$doc" } }
])
The same principle applies that you essentially $sort the collection and then $group on the required grouping key picking up the $last data from the grouping boundary.
Making things a bit clearer since the original writing is that you can use $$ROOT instead of specifying every document property, and of course the $replaceRoot stage allows you to restore that data fully as the original document form.
But the general solution is still $sort first, then $group on the common key that is required and keep the $last or $first depending on sort order occurrences from the grouping boundary for the properties that are required.
Also for BSON Dates as opposed to a timestamp value as in the question, see Group result by 15 minutes time interval in MongoDb for different approaches on how to accumulate for different time intervals actually using and returning BSON Date values.
Not quite sure what you are going for here but you could do this in aggregate if my understanding is right. So to get the last record for each day:
db.collection.aggregate([
// Sort in date order as ascending
{"$sort": { "date": 1 } },
// Date math converts to whole day
{"$project": {
"adco": 1,
"hchc": 1,
"hchp": 1,
"hhphc": 1,
"ptec": 1,
"iinst": 1,
"papp": 1,
"imax": 1,
"optarif": 1,
"isousc": 1,
"motdetat": 1,
"date": 1,
"wholeDay": {"$subtract": ["$date",{"$mod": ["$date",86400000]}]}
}},
// Group on wholeDay ( _id insertion is monotonic )
{"$group":
"_id": "$wholeDay",
"docId": {"$last": "$_id" },
"adco": {"$last": "$adco" },
"hchc": {"$last": "$hchc" },
"hchp": {"$last": "$hchp" },
"hhphc": {"$last": "$hhphc" },
"ptec": {"$last": "$ptec" },
"iinst": {"$last": "$iinst" },
"papp": {"$last": "$papp" },
"imax": {"$last": "$imax" },
"optarif": {"$last": "$optarif",
"isousc": {"$last": "$isouc" },
"motdetat": {"$last": "$motdetat" },
"date": {"$last": "$date" },
}}
])
So the principle here is that given the timestamp value, do the date math to project that as the midnight time at the beginning of each day. Then as the _id key on the document is already monotonic (always increasing), then simply group on the wholeDay value while pulling the $last document from the grouping boundary.
If you don't need all the fields then only project and group on the ones you want.
And yes you can do this in the spring data framework. I'm sure there is a wrapped command in there. But otherwise, the incantation to get to the native command goes something like this:
mongoOps.getCollection("yourCollection").aggregate( ... )
For the record, if you actually had BSON date types rather than a timestamp as a number, then you can skip the date math:
db.collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"day": { "$dayOfMonth": "$date" }
},
"hchp": { "$last": "$hchp" }
}}
])
It's also possible to format timestamps in the group key as %Y-%m-%d (e.g. 2021-12-05) with dateToString:
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638686311000, value: "b" } <= 2021-12-05 6:38:31
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
db.collection.aggregate([
{ $sort: { timestamp: 1 } },
// { timestamp: 1638686311000, value: "b" }
// { timestamp: 1638697946000, value: "a" }
// { timestamp: 1638859111000, value: "c" }
{ $group: {
_id: { $dateToString: { date: { $toDate: "$timestamp" }, format: "%Y-%m-%d" } },
last: { $last: "$$ROOT" }
}},
// { _id: "2021-12-07", last: { timestamp: 1638859111000, value: "c" } }
// { _id: "2021-12-05", last: { timestamp: 1638697946000, value: "a" } }
{ $replaceWith: "$last" }
])
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
This:
first $sorts documents by chronological order of timestamps such that we can latter on pick newest documents based on their order.
then $groups documents by their %Y-%m-%d-formatted timestamps:
by first converting the timestamp into a datetime: { $toDate: "$timestamp" }
and then converting the associated datetime into a string only representing the year, month and day: { $dateToString: { date: ..., format: "%Y-%m-%d" } }
such that for each group (i.e. date), we can pick the $last (i.e. newest since chronologically sorted) matching document
and the pick is the whole document as represented by $$ROOT
finally cleans up the group result with a $replaceWith stage (alias for $replaceRoot).