MongoDB - aggregate $group by every seven days from a date - mongodb

Suppose you have any number of documents in a collection with the following structure:
{
"_id" : "1",
"totalUsers" : NumberInt(10000),
"iosUsers" : NumberInt(5000),
"androidUsers" : NumberInt(5000),
"creationTime" : ISODate("2017-12-04T06:14:21.529+0000")
},
{
"_id" : "2",
"totalUsers" : NumberInt(12000),
"iosUsers" : NumberInt(6000),
"androidUsers" : NumberInt(6000),
"creationTime" : ISODate("2017-12-04T06:14:21.529+0000")
},
{
"_id" : "3",
"totalUsers" : NumberInt(14000),
"iosUsers" : NumberInt(7000),
"androidUsers" : NumberInt(7000),
"creationTime" : ISODate("2017-12-04T06:14:21.529+0000")
}
And want to write a query that returns results between two given dates (ie: startDate and endDate) and then group the results every seven days:
db.collection.aggregate(
{ $match: {$gte: startDate, $lte: endDate } },
{ $group: { _id: { --- every seven days from endDate --- } }
)
How can I do this?

First get boundries
var boundries = [];
vat sd= ISODate("2017-10-18T20:41:33.602+0000"),ed=ISODate("2017-11-22T12:41:36.348+0000");
boundries.push(sd);
var i = sd;
while(i<=ed){
//push ISODate(i + 7 days) in boundries
}
//also push ISODate(ed+1day) because upper bound is exclusive
//use $bucket aggregation
db.collection.aggregate({$match:{creationTime:{$gte:stDate,$lte:endDate}}},{
$bucket:{
groupBy: "$creationTime",
boundaries:boundries ,
}
})

Related

Query to count number of occurrence in array grouped by day

I have the following document structure:
(trackerEventsCollection) =
{
"_id" : ObjectId("5b26c4fb7c696201040c8ed1"),
"trackerId" : ObjectId("598fc51324h51901043d76de"),
"trackingEvents" : [
{
"type" : "checkin",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:34:58.964Z")
},
{
"type" : "power",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.795Z")
},
{
"type" : "position",
"eventSource" : "app",
"timestamp" : ISODate("2017-08-25T06:51:23.985Z")
}
]
}
I would like to write a query that would count number of trackingEvents with type "type" : "power" grouped by day. This seems to be quite tricky to me because parent document does not have date and I should rely on timestamp field that belongs to the trackingEvents array members.
I'm not really experienced mongodb user and couldn't understand how can this be achieved so far.
Would really appreciate any help, thanks
To process your nested array as a separate documents you need to use $unwind. In the next stage you can use $match to filter out by type. Then you can group by single days counting occurences. The point is that you have to build grouping key containing year, month and day like in following code:
db.trackerEvents.aggregate([
{ $unwind: "$trackingEvents" },
{ $match: { "trackingEvents.type": "power" } },
{
$group: {
_id: {
year: { $year:"$trackingEvents.timestamp" },
month:{ $month:"$trackingEvents.timestamp" },
day: { $dayOfMonth:"$trackingEvents.timestamp" }
},
count: { $sum: 1 }
}
}
])

Remove redundant data from sensors by using date and value

I'm developing an application that collects data from sensors and I need to reduce the amount of data that is stored in a mongodb database by using a value (temperature) and a date (timestamp).
The document have the following format:
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:37:50.370Z")
sensorCode:"SENSOR_A1"
}
The problem is that sensors sent data too much frequently so there are too many documents with redudant data in a short period of time (let's say 10 minutes). I meant it is not useful to have multiple equal values in a very short period of time.
Example: here there are data from a sensor that is reporting temperature is 10
// collection: datasensors
[
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:37:50.370Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:50.555Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:51.654Z")
sensorCode:"SENSOR_A1"
}
,
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:50:20.335Z")
sensorCode:"SENSOR_A1"
}
]
Because a minute precission is not required, I would like to remove all documents from 2016-04-29T14:37:50.370Z to 2016-04-29T14:38:51.32Z except one. So the result should be this:
[
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:51.654Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:50:20.335Z")
sensorCode:"SENSOR_A1"
}
]
The remove operation I want to perform should "reduce" equal temperatures in time ranges less than 10 minutes to one value.
Is there any technique to achieve this?
I simplified my solution and decided to keep every unique measurement received in 10 minutes time window.
Mongo 3.2 is required for that
adding a time mark will separate measurements in 10 minutes time groups
Then we are preserving first record in group and storing all ids for futher process
Then removing id of document we want to keep from an array of all ids (let say documents to delete)
Finally as forEach loop we are deleting not needed ids - this line is commented :-)
Copy code below to mongo console, execute and verify ids to delete, then un-comment and GO!
var addTimeMark = {
$project : {
_id : 1,
temperature : 1,
timestamp : 1,
sensorCode : 1,
yearMonthDay : {
$substr : [{
$dateToString : {
format : "%Y%m%d%H%M",
date : "$timestamp"
}
}, 0, 11]
}
}
}
var getFirstRecordInGroup = {
// take only first record froum group
$group : {
_id : {
timeMark : "$yearMonthDay",
sensorCode : "$sensorCode",
temperature : "$temperature"
},
id : {
$first : "$_id"
},
allIds : {
$push : "$_id"
},
timestamp : {
$first : "$timestamp"
},
totalEntries : {
$sum : 1
}
}
}
var removeFirstIdFromAllIds = {
$project : {
_id : 1,
id : 1,
timestamp : 1,
totalEntries : 1,
allIds : {
$filter : {
input : "$allIds",
as : "item",
cond : {
$ne : ["$$item", "$id"]
}
}
}
}
}
db.sensor.aggregate([
addTimeMark,
getFirstRecordInGroup,
removeFirstIdFromAllIds,
]).forEach(function (entry) {
printjson(entry.allIds);
// db.sensor.deleteMany({_id:{$in:entry.allIds}})
})
below document outlook after each step:
{
"_id" : ObjectId("574b5d8e0ac96f88db507209"),
"temperature" : 10,
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"sensorCode" : "SENSOR_A1",
"yearMonthDay" : "20160429143"
}
2:
{
"_id" : {
"timeMark" : "20160429143",
"sensorCode" : "SENSOR_A1",
"temperature" : 10
},
"id" : ObjectId("574b5d8e0ac96f88db507209"),
"allIds" : [
ObjectId("574b5d8e0ac96f88db507209"),
ObjectId("574b5d8e0ac96f88db50720a"),
ObjectId("574b5d8e0ac96f88db50720b")
],
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"totalEntries" : 3
}
and last;
{
"_id" : {
"timeMark" : "20160429143",
"sensorCode" : "SENSOR_A1",
"temperature" : 10
},
"id" : ObjectId("574b5d8e0ac96f88db507209"),
"allIds" : [
ObjectId("574b5d8e0ac96f88db50720a"),
ObjectId("574b5d8e0ac96f88db50720b")
],
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"totalEntries" : 3
}

Counting number of records that where date is in date range?

I have a collection with documents like below:
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-05T00:00:00Z")},
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")},
{startDate: ISODate("2016-01-07T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")}
I would like to return a record for every date between the minimum startDate and the maximum endDate. Along with each of these records I would like to return a count of the number of records where the startDate and endDate contain this date.
So for my above example the min startDate is 1/2/2016 and the max endDate is 1/10/2016 so I would like to return all dates between those two along with the counts. See desired output below:
{date: ISODate("2016-01-02T00:00:00Z"), count: 2}
{date: ISODate("2016-01-03T00:00:00Z"), count: 2}
{date: ISODate("2016-01-04T00:00:00Z"), count: 2}
{date: ISODate("2016-01-05T00:00:00Z"), count: 4}
{date: ISODate("2016-01-06T00:00:00Z"), count: 3}
{date: ISODate("2016-01-07T00:00:00Z"), count: 4}
{date: ISODate("2016-01-08T00:00:00Z"), count: 4}
{date: ISODate("2016-01-09T00:00:00Z"), count: 2}
{date: ISODate("2016-01-010T00:00:00Z"), count: 2}
Please let me know if this doesn't make sense and I can try to explain in more detail.
I am able to do this using a loop like below:
var startDate = ISODate("2016-01-02T00:00:00Z")
var endDate = ISODate("2016-02-10T00:00:00Z")
while(startDate < endDate){
var counts = db.data.find(
{
startDate: {$lte: startDate},
endDate: {$gte: startDate}
}
).count()
print(startDate, counts)
startDate.setDate(startDate.getDate() + 1)
}
But i'm wondering if there is a way to do this using the aggregation framework? I come from a mostly SQL background where looping to get data is often a bad idea. Does this same rule apply for MongoDB? Should I be concerned about using looping here and try to use the aggregation framework or is this a valid solution?
Your best bet here is mapReduce. This is because you can loop values in between "startDate" and "endDate" within each document and emit for each day ( or other required interval ) between those values. Then it is just a matter of accumulating per emitted date key from all data:
db.collection.mapReduce(
function() {
for( var d = this.startDate.valueOf(); d <= this.endDate.valueOf(); d += 1000 * 60 * 60 * 24 ) {
emit(new Date(d), 1)
}
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
This produces results like this:
{
"results" : [
{
"_id" : ISODate("2016-01-02T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-03T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-04T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-05T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-06T00:00:00Z"),
"value" : 3
},
{
"_id" : ISODate("2016-01-07T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-08T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-09T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-10T00:00:00Z"),
"value" : 2
}
],
"timeMillis" : 35,
"counts" : {
"input" : 5,
"emit" : 25,
"reduce" : 9,
"output" : 9
},
"ok" : 1
}
Your dates are rounded to a day in the sample, but if they were not in real data then it is just a simple matter of date math to be applied in order to round per interval.
In mongodb aggregate framework there are stages instead of loop. It is a pipeline and it goes through each stage until it reaches the last stage specified. That is why you see a [] when using aggregate framework. there are several stages, to name a few (match, group and project). Take a look at their document it is quite simple. anyways that was very brief. As for your question here is my proposition:
I have not tried this. If you can try this and let me know if it works:
First you only keep those with dates in the range you desire using $match. Then follow that with the $group stage.
Example:
db.collection.aggregate{
[
{$match: {
$and : [
{startDate: {$gte:ISODate("2016-01-02T00:00:00Z")},
{endDate: {$lte:ISODate("2016-02-10T00:00:00Z")}
]
},
{$group:
{_id: {startDate:"$startDate",endDate:"$endDate"},
count:{$sum:1}
}
}
]
}
If you want to just group using startDate as in you example replace
_id: {startDate:"$startDate",endDate:"$endDate"
with this:
_id: "$startDate"
I hope that helps

MongoDb aggregation Group by Date

I'm trying to group by timestamp for the collection named "foo" { _id, TimeStamp }
db.foos.aggregate(
[
{$group : { _id : new Date (Date.UTC({ $year : '$TimeStamp' },{ $month : '$TimeStamp' },{$dayOfMonth : '$TimeStamp'})) }}
])
Expecting many dates but the result is just one date. The data i'm using is correct (has many foo and different dates except 1970). There's some problem in the date parsing but i can not solve yet.
{
"result" : [
{
"_id" : ISODate("1970-01-01T00:00:00.000Z")
}
],
"ok" : 1
}
Tried this One:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : new Date('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Result :
uncaught exception: aggregate failed: {
"errmsg" : "exception: disallowed field type Date in object expression (at 'parsedDate')",
"code" : 15992,
"ok" : 0
}
And that one:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : Date.UTC('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Can not see dates in the result
{
"result" : [
{
"count" : 412
},
{
"count" : 1702
},
{
"count" : 422
}
],
"ok" : 1
}
db.foos.aggregate(
[
{ $project : { day : {$substr: ["$TimeStamp", 0, 10] }}},
{ $group : { _id : "$day", number : { $sum : 1 }}},
{ $sort : { _id : 1 }}
]
)
Group by date can be done in two steps in the aggregation framework, an additional third step is needed for sorting the result, if sorting is desired:
$project in combination with $substr takes the first 10 characters (YYYY:MM:DD) of the ISODate object from each document (the result is a collection of documents with the fields "_id" and "day");
$group groups by day, adding (summing) the number 1 for each matching document;
$sort ascending by "_id", which is the day from the previous aggregation step - this is optional if sorted result is desired.
This solution can not take advantage of indexes like db.twitter.ensureIndex( { TimeStamp: 1 } ), because it transforms the ISODate object to a string object on the fly. For large collections (millions of documents) this could be a performance bottleneck and more sophisticated approaches should be used.
It depends on whether you want to have the date as ISODate type in the final output. If so, then you can do one of two things:
Extract $year, $month, $dayOfMonth from your timestamp and then reconstruct a new date out of them (you are already trying to do that, but you're using syntax that doesn't work in aggregation framework).
If the original Timestamp is of type ISODate() then you can do date arithmetic to subtract the hours, minutes, seconds and milliseconds from your timestamp to get a new date that's "rounded" to the day.
There is an example of 2 here.
Here is how you would do 1. I'm making an assumption that all your dates are this year, but you can easily adjust the math to accommodate your oldest date.
project1={$project:{_id:0,
y:{$subtract:[{$year:"$TimeStamp"}, 2013]},
d:{$subtract:[{$dayOfYear:"$TimeStamp"},1]},
TimeStamp:1,
jan1:{$literal:new ISODate("2013-01-01T00:00:00")}
} };
project2={$project:{tsDate:{$add:[
"$jan1",
{$multiply:["$y", 365*24*60*60*1000]},
{$multiply:["$d", 24*60*60*1000]}
] } } };
Sample data:
db.foos.find({},{_id:0,TimeStamp:1})
{ "TimeStamp" : ISODate("2013-11-13T19:15:05.600Z") }
{ "TimeStamp" : ISODate("2014-02-01T10:00:00Z") }
Aggregation result:
> db.foos.aggregate(project1, project2)
{ "tsDate" : ISODate("2013-11-13T00:00:00Z") }
{ "tsDate" : ISODate("2014-02-01T00:00:00Z") }
This is what I use in one of my projects :
collection.aggregate(
// group results by date
{$group : {
_id : { date : "$date" }
// do whatever you want here, like $push, $sum...
}},
// _id is the date
{$sort : { _id : -1}},
{$orderby: { _id : -1 }})
.toArray()
Where $date is a Date object in mongo. I get results indexed by date.

Find all documents within last n days

My daily collection has documents like:
..
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "ED", "san" : 7046.25, "izm" : 1243.96 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "UA", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "CTA_TR", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "CAD", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "INT", "san" : 0, "izm" : 169.9 }
...
I left off _id field to spare the space here.
My task is to "fetch all documents within last 15 days". As you can see I need somehow to:
Get 15 unique dates. The newest one should be taken as the newest document in collection (what I mean that it isn't necessary the today's date, it's just the latest one in collection based on date field), and the oldest.. well, maybe it's not necessary to strictly define the oldest day in query, what I need is some kind of top15 starting from the newest day, if you know what I mean. Like 15 unique days.
db.daily.find() all documents, that have date field in that range of 15 days.
In the result, I should see all documents within 15 days starting from the newest in collection.
I just tested the following query against your data sample and it worked perfectly:
db.datecol.find(
{
"date":
{
$gte: new Date((new Date().getTime() - (15 * 24 * 60 * 60 * 1000)))
}
}
).sort({ "date": -1 })
Starting in Mongo 5, it's a nice use case for the $dateSubtract operator:
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-11-28") } <= older than 5 days
db.collection.aggregate([
{ $match: {
$expr: {
$gt: [
"$date",
{ $dateSubtract: { startDate: "$$NOW", unit: "day", amount: 5 } }
]
}
}}
])
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
With $dateSubtract, we create the oldest date after which we keep documents, by subtracting 5 (amount) "days" (unit) out of the current date $$NOW (startDate).
And you can obviously add a $sort stage to sort documents by date.
You need to run the distinct command to get all the unique dates. Below is the example. The "values" array has all the unique dates of the collection from which you need to retrieve the most recent 15 days on the client side
db.runCommand ( { distinct: 'datecol', key: 'date' } )
{
"values" : [
ISODate("2013-01-03T00:00:00Z"),
ISODate("2013-01-04T00:00:00Z")
],
"stats" : {
"n" : 2,
"nscanned" : 2,
"nscannedObjects" : 2,
"timems" : 0,
"cursor" : "BasicCursor"
},
"ok" : 1
}
You then use the $in operator with the most recent 15 dates from step 1. Below is an example that finds all documents that belong to one of the mentioned two dates.
db.datecol.find({
"date":{
"$in":[
new ISODate("2013-01-03T00:00:00Z"),
new ISODate("2013-01-04T00:00:00Z")
]
}
})