Find last record of each day - mongodb

I store data about my power consumption, each minute there is a new record, here is an example:
{"date":1393156826114,"id":"5309d4cae4b0fbd904cc00e1","adco":"O","hchc":7267599,"hchp":10805900,"hhphc":"g","ptec":"c","iinst":13,"papp":3010,"imax":58,"optarif":"s","isousc":60,"motdetat":"Á"}
such that I have around 1440 records a day.
How can I get the last record of each day?
Note: I use mongodb in spring java, so I need a query like this:
Example to get all measures :
#Query("{ 'date' : { $gt : ?0 }}")
public List<Mesure> findByDateGreaterThan(Date date, Sort sort);

A bit more modern than the original answer:
db.collection.aggregate([
{ "$sort": { "date": 1 } },
{ "$group": {
"_id": {
"$subtract": ["$date",{"$mod": ["$date",86400000]}]
},
"doc": { "$last": "$$ROOT" }
}},
{ "$replaceRoot": { "newDocument": "$doc" } }
])
The same principle applies that you essentially $sort the collection and then $group on the required grouping key picking up the $last data from the grouping boundary.
Making things a bit clearer since the original writing is that you can use $$ROOT instead of specifying every document property, and of course the $replaceRoot stage allows you to restore that data fully as the original document form.
But the general solution is still $sort first, then $group on the common key that is required and keep the $last or $first depending on sort order occurrences from the grouping boundary for the properties that are required.
Also for BSON Dates as opposed to a timestamp value as in the question, see Group result by 15 minutes time interval in MongoDb for different approaches on how to accumulate for different time intervals actually using and returning BSON Date values.
Not quite sure what you are going for here but you could do this in aggregate if my understanding is right. So to get the last record for each day:
db.collection.aggregate([
// Sort in date order as ascending
{"$sort": { "date": 1 } },
// Date math converts to whole day
{"$project": {
"adco": 1,
"hchc": 1,
"hchp": 1,
"hhphc": 1,
"ptec": 1,
"iinst": 1,
"papp": 1,
"imax": 1,
"optarif": 1,
"isousc": 1,
"motdetat": 1,
"date": 1,
"wholeDay": {"$subtract": ["$date",{"$mod": ["$date",86400000]}]}
}},
// Group on wholeDay ( _id insertion is monotonic )
{"$group":
"_id": "$wholeDay",
"docId": {"$last": "$_id" },
"adco": {"$last": "$adco" },
"hchc": {"$last": "$hchc" },
"hchp": {"$last": "$hchp" },
"hhphc": {"$last": "$hhphc" },
"ptec": {"$last": "$ptec" },
"iinst": {"$last": "$iinst" },
"papp": {"$last": "$papp" },
"imax": {"$last": "$imax" },
"optarif": {"$last": "$optarif",
"isousc": {"$last": "$isouc" },
"motdetat": {"$last": "$motdetat" },
"date": {"$last": "$date" },
}}
])
So the principle here is that given the timestamp value, do the date math to project that as the midnight time at the beginning of each day. Then as the _id key on the document is already monotonic (always increasing), then simply group on the wholeDay value while pulling the $last document from the grouping boundary.
If you don't need all the fields then only project and group on the ones you want.
And yes you can do this in the spring data framework. I'm sure there is a wrapped command in there. But otherwise, the incantation to get to the native command goes something like this:
mongoOps.getCollection("yourCollection").aggregate( ... )
For the record, if you actually had BSON date types rather than a timestamp as a number, then you can skip the date math:
db.collection.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$date" },
"month": { "$month": "$date" },
"day": { "$dayOfMonth": "$date" }
},
"hchp": { "$last": "$hchp" }
}}
])

It's also possible to format timestamps in the group key as %Y-%m-%d (e.g. 2021-12-05) with dateToString:
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638686311000, value: "b" } <= 2021-12-05 6:38:31
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
db.collection.aggregate([
{ $sort: { timestamp: 1 } },
// { timestamp: 1638686311000, value: "b" }
// { timestamp: 1638697946000, value: "a" }
// { timestamp: 1638859111000, value: "c" }
{ $group: {
_id: { $dateToString: { date: { $toDate: "$timestamp" }, format: "%Y-%m-%d" } },
last: { $last: "$$ROOT" }
}},
// { _id: "2021-12-07", last: { timestamp: 1638859111000, value: "c" } }
// { _id: "2021-12-05", last: { timestamp: 1638697946000, value: "a" } }
{ $replaceWith: "$last" }
])
// { timestamp: 1638697946000, value: "a" } <= 2021-12-05 9:52:26
// { timestamp: 1638859111000, value: "c" } <= 2021-12-07 6:38:31
This:
first $sorts documents by chronological order of timestamps such that we can latter on pick newest documents based on their order.
then $groups documents by their %Y-%m-%d-formatted timestamps:
by first converting the timestamp into a datetime: { $toDate: "$timestamp" }
and then converting the associated datetime into a string only representing the year, month and day: { $dateToString: { date: ..., format: "%Y-%m-%d" } }
such that for each group (i.e. date), we can pick the $last (i.e. newest since chronologically sorted) matching document
and the pick is the whole document as represented by $$ROOT
finally cleans up the group result with a $replaceWith stage (alias for $replaceRoot).

Related

MongoDB - Dates between using $match

So I try to use MongoDB $match to get data between 2 dates, but it turns out that the data is not returning a.k.a empty here. What it looks like:
db.collection.aggregate([
{
$match: {
date: {
$gte: new Date("2022-10-23"),
$lt: new Date("2022-10-25"),
},
}
},
{
$group: {
_id: "$title",
title: {
$first: "$title"
},
answer: {
$push: {
username: "$username",
date: "$date",
formId: "$formId",
answer: "$answer"
}
}
}
},
])
Here is the data that I try to run on the Mongo playground:
https://mongoplayground.net/p/jKx_5kZnJNz
I think there is no error with my code anymore... but why it gives an empty return.
Migrate the comment to the answer post for the complete explanation.
Issue 1
The document contains the date field as a string type while you are trying to compare with Date which leads to incorrect output.
Ensure that you are comparing both values in the exact type.
Either that migrate the date value to Date type or
converting the date field to Date type in the query via $toDate.
{
$match: {
$expr: {
$and: [
{
$gte: [
{
$toDate: "$date"
},
new Date("2022-10-23")
]
},
{
$lt: [
{
$toDate: "$date"
},
new Date("2022-10-25")
]
}
]
}
}
}
Issue 2
Since you are using $lt ($lt: new Date("2022-10-25")), it won't include the documents with date: new Date("2022-10-25").
For inclusive end date, you shall use $lte.
Demo # Mongo Playground

mongodb find between dates (month, year)

I have a collection with two fields, similar to the one bellow:
{
year: 2017,
month: 04 }
How can i select documents between 2017/07 - 2018/04?
Solved with:
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$addFields: {
"date": {
"$dateFromParts": {
"year": "$year",
"month": {"$toInt": "$month"}
}
}
}
},
// Stage 2
{
$match: {
"date": {
"$gte": ISODate("2017-07-01T00:00:00.000Z"),
"$lte": ISODate("2018-04-30T00:00:00.000Z")
}
}
},
]
);
Firstly you have to make sure you db is storing date in ISO format ( a format that mongo supports )
You can use following command to find documents :-
model.find({
date:{
$gte:ISODate("2017-04-29T00:00:00.000Z"),
$lte:ISODate("2017-07-29T00:00:00.000Z"),
}
})
Where model is the name of the collection and date is a attribute of
document holding dates in ISO format.

SQL to Mongo Aggregation

Hi I want to change my sql query to mongo aggregation.
select c.year, c.minor_category, count(c.minor_category) from Crime as c
group by c.year, c.minor_category having c.minor_category = (
Select cc.minor_category from Crime as cc where cc.year=c.year group by
cc.minor_category order by count(*) desc, cc.minor_category limit 1)
I tried do something like this:
db.crimes.aggregate({
$group: {
"_id": {
year: "$year",
minor_category :"$minor_category",
count: {$sum: "$minor_category"}
}
},
},
{
$match : {
minor_category: ?
}
})
But i stuck in $match which is equivalent to having, but i dont know how to make subqueries in mongo like in my sql query.
Can anybody can help me ?
Ok based on the confirmation above , the below query should work.
db.crime.aggregate
([
{"$group":{"_id":{"year":"$year","minor":"$minor"},"count":{"$sum":1}}},
{"$project":{"year":"$_id.year","count":"$count","minor":"$_id.minor","document":"$$ROOT"}},
{"$sort":{"year":1,"count":-1}},
{"$group":{"_id":{"year":"$year"},"orig":{"$first":"$document"}}},
{"$project":{"_id":0,"year":"$orig._id.year","minor":"$orig._id.minor","count":"$orig.count"}}
)]
This translates into the following MongoDB query:
db.crime.aggregate({
$group: { // group by year and minor_catetory
_id: {
"year": "$year",
"minor_category": "$minor_category"
},
"count": { $sum: 1 }, // count all documents per group,
}
}, {
$sort: {
"count": -1, // sort descending by count
"minor_category": 1 // and ascending by minor_category
}
}, {
$group: { // now we get the highst element per year
_id: "$_id.year", // so group by year
"minor_category": { $first: "$_id.minor_category" }, // and get the first (we've sorted the data) value
"count": { $first: "$count" } // same here
}
}, {
$project: { // remove the _id field and add the others in the right order (if needed)
"_id": 0,
"year": "$_id",
"minor_category": "$minor_category",
"count": "$count"
}
})

How to perform case-insensitive aggregation grouping in MongoDb?

Let's say that I want to aggregate and group by documents in MongoDb by the Description field.
Running the following (case-sensitive by default):
db['Products'].aggregate(
{ $group: {
_id: { 'Description': "$Description" },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
);
on my sample data gives me 1000 results, which is fine.
But now I expect that running a case-insensitive query (using $toLower) should give me less than or equal to 1000 results:
db['Products'].aggregate(
{ $group: {
_id: { 'Description': {$toLower: "$Description"} },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
);
But instead I get more than 1000 results. That can't be right, can it? More common entries should get grouped together to yield less number of total groupings ... I think.
So then probably my aggregation query is wrong! Which brings me to my question:
How should case-insensitive aggregation grouping in MongoDb be performed?
You approach to case-insensitive grouping is correct so perhaps your observation is not? ;)
Try this example:
// insert two documents
db.getCollection('test').insertOne({"name" : "Test"}) // uppercase 'T'
db.getCollection('test').insertOne({"name" : "test"}) // lowercase 't'
// perform the grouping
db.getCollection('test').aggregate({ $group: { "_id": { $toLower: "$name" }, "count": { $sum: 1 } } }) // case insensitive
db.getCollection('test').aggregate({ $group: { "_id": "$name", "count": { $sum: 1 } } }) // case sensitive
You may have a typo somewhere?
The documentation also states that
$toLower only has a well-defined behavior for strings of ASCII characters.
Perhaps that's what's biting you here?

Need to aggregate by hour and $avg not recognized

From a MongoDB collection storing data with time stamps I need to return a single record for each hour.
So far I have selected the set of records between two dates successfully, but I cant figure how to build the hourly record I need in the $group clause.
var myName = "CollectionName"
//schema for mongoose
var mySchema = new Schema({
dt: Date,
value: Number
});
var myDB = mongoose.createConnection('mongodb://localhost:27017/MYDB');
myDBObj = myDB.model(myName, evalSchema, myName);
The match in this aggregate call works fine, and the $hour creates a record for each hour in the day.. but I don't know how to recreate the a full date and get an error "unknown group operator $avg" ...
myDBObj.aggregate([
{
$match: { "dt": { $gt: new Date("October 13, 2010 12:00:00"), $lt: new Date("November 13, 2010 12:00:00") } }
},{
$group: {
"_id": { "dt": { "$hour": "$dt" } , "price": { "$avg": "$price" }}
}], function (err, data) { if (err) { return next(err); } res.json(data); });
I think I need to use $dayOfYear so there is different records for each hour of each day, and include a new Date() somewhere ...
Can someone help me do this correctly? any help is appreciated.
The $group pipeline stage works by "grouping" all data by the "key" specified for _id. Other fields you are actually aggregating are separate from the _id value and are their own field properties.
So your $group becomes this instead:
{ "$group": {
"_id": { "$hour": "$dt" },
"price": { "$avg": "$price" }
}}
Or if you want that broken by day then make a compound key:
{ "$group": {
"_id": {
"day": { "$dayOfYear": "$dt" },
"hour": { "$hour": "$dt" }
},
"price": { "$avg": "$price" }
}}
Or just use date math to produce Date objects rounded by hour:
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$dt", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$dt", new Date(0) ] },
1000 * 60 *60
]}
]},
new Date(0)
]
},
"price": { "$avg": "$price" }
}}
Where subrtacting another date object (epoch date) from another prodces a numeric value you can round ( 1000 milliseconds, 60 seconds, 60 minutes = 1 hour ) with the applied math, and adding a number to a date object produces a date corresponding to that value.
So your problem was you had everything in the _id, where the $avg accumulator is not recognised. All accumulators need to be specified outside of the grouping key. That is the intent.
If you want to make an accumulator value part of a grouping key ( does not seem relevant here though ), you instead follow with another group stage, referencing the field that was produced from the former.