How to handle partial week data grouping in mongodb - mongodb

I have some docs (daily open price for a stock) like the followings:
/* 0 */
{
"_id" : ObjectId("54d65597daf0910dfa8169b0"),
"D" : ISODate("2014-12-29T00:00:00.000Z"),
"O" : 104.98
}
/* 1 */
{
"_id" : ObjectId("54d65597daf0910dfa8169af"),
"D" : ISODate("2014-12-30T00:00:00.000Z"),
"O" : 104.73
}
/* 2 */
{
"_id" : ObjectId("54d65597daf0910dfa8169ae"),
"D" : ISODate("2014-12-31T00:00:00.000Z"),
"O" : 104.51
}
/* 3 */
{
"_id" : ObjectId("54d65597daf0910dfa8169ad"),
"D" : ISODate("2015-01-02T00:00:00.000Z"),
"O" : 103.75
}
/* 4 */
{
"_id" : ObjectId("54d65597daf0910dfa8169ac"),
"D" : ISODate("2015-01-05T00:00:00.000Z"),
"O" : 102.5
}
and I want to aggregate the records by week so I can get the weekly average open price. My first attempt is to use:
db.ohlc.aggregate({
$match: {
D: {
$gte: new ISODate('2014-12-28')
}
}
}, {
$project: {
year: {
$year: '$D'
},
week: {
$week: '$D'
},
O: 1
}
}, {
$group: {
_id: {
year: '$year',
week: '$week'
},
O: {
$avg: '$O'
}
}
}, {
$sort: {
_id: 1
}
})
Bu I soon realized the result is incorrect as both the last week of 2014 (week number 52) and the first week of 2015 (week number 0) are partial weeks. With this aggregation I would have an average price for 12/29-12/31/2014 and another one for 01/02/2015 (which is the only trading date in the first week of 2015) but in my application I would need to group the data from 12/29/2015 through 01/02/2015. Any advice?

To answer my own question, the trick is to calculate the number of weeks based on a reference date (1970-01-04) and group by that number. You can check out my new post at http://midnightcodr.github.io/2015/02/07/OHLC-data-grouping-with-mongodb/ for details.

I use this for candelization; with allowDiskUsage, out and some date filters it works great. Maybe you can adopt the grouping?
db.getCollection('market').aggregate(
[
{ $match: { date: { $exists: true } } },
{ $sort: { date: 1 } },
{ $project: { _id: 0, date: 1, rate: 1, amount: 1, tm15: { $mod: [ "$date", 900 ] } } },
{ $project: { _id: 0, date: 1, rate: 1, amount: 1, candleDate: { $subtract: [ "$date", "$tm15" ] } } },
{ $group: { _id: "$candleDate", open: { $first: '$rate' }, low: { $min: '$rate' }, high: { $max: '$rate' }, close: { $last: '$rate' }, volume: { $sum: '$amount' }, trades: { $sum: 1 } } }
])

From my experience, this is not a really good approach to tackle the problem. Why? This will definitely not scale, the amount of computation needed is quite exhausting, specially to do the grouping.
What I would do in your situation is to move part of the application logic to the documents in the DB.
My first approach would be to add a "week" field that will state the previous (or next) Sunday of the date the sample belongs to. This is quite easy to do at the moment of insertion. Then you can simply run the aggregation method grouping by that field. If you want more performance, add an index for { symbol : 1, week : 1 } and do a sort in the aggregate.
My second approach, which would be if you plan on making a lot this type of aggregations, is basically having documents that group the samples in a weekly manner. Like this:
{
week : <Day Representing Week>,
prices: [
{ Day Sample }, ...
]
}
Then you can simply work on those documents directly. This will help you reduce your indexes in a significant manner, thus speeding things up.

Related

MongoDB grouping results based on greater than expression

I have a ton of records in a collection that look like this:
{
"_id" : ObjectId("5a95cf7790bd8fbf1c6a39da"),
"dmb_reviewerID" : "AB9S9279OZ3QO",
"dmb_asin" : "0078764343",
"dmb_reviewerName" : "Alan",
"dmb_helpful" : [
1,
1
],
"dmb_reviewText" : "I haven't gotten around to playing the campaign but the multiplayer is solid and pretty fun. Includes Zero Dark Thirty pack, an Online Pass, and the all powerful Battlefield 4 Beta access.",
"dmb_overall" : 5.0,
"dmb_summary" : "Good game and Beta access!!",
"dmb_unixReviewTime" : 1373155200,
"dmb_reviewTime" : "07 7, 2013"
}
I need to find all of the product IDs (dmb_asin) which have 200 reviews or more.
So far, I've managed to count them and return a sum using an aggregate, but I can't figure out how to only show those that are greater than 200.
My code:
aggregate({
$group: {
_id: "$dmb_asin",
reviews: {
$addToSet: "$dmb_asin"
},
count: {
$sum: 1,},
}
});
Try this code (if I correctly understand you)
aggregate([
{
$group: {
_id: '$dmb_asin',
count: {
$sum: 1
}
}
},
{
$match: {
count: {
$gte: 200
}
}
}
])
Try this query:
db.collection.aggregate([
{$group: {
_id: "$dmb_asin",
reviews: {
$addToSet: "$dmb_asin"
},
count: {
$sum: 1,}
}},
{$match:{"reviews":{$gte:200}}}
])

Aggregation pipeline slow with large collection

I have a single collection with over 200 million documents containing dimensions (things I want to filter on or group by) and metrics (things I want to sum or get averages from). I'm currently running against some performance issues and I'm hoping to gain some advice on how I could optimize/scale MongoDB or suggestions on alternative solutions. I'm running the latest stable MongoDB version using WiredTiger. The documents basically look like the following:
{
"dimensions": {
"account_id": ObjectId("590889944befcf34204dbef2"),
"url": "https://test.com",
"date": ISODate("2018-03-04T23:00:00.000+0000")
},
"metrics": {
"cost": 155,
"likes": 200
}
}
I have three indexes on this collection, as there are various aggregations being ran on this collection:
account_id
date
account_id and date
The following aggregation query fetches 3 months of data, summing cost and likes and grouping by week/year:
db.large_collection.aggregate(
[
{
$match: { "dimensions.date": { $gte: new Date(1512082800000), $lte: new Date(1522447200000) } }
},
{
$match: { "dimensions.account_id": { $in: [ "590889944befcf34204dbefc", "590889944befcf34204dbf1f", "590889944befcf34204dbf21" ] }}
},
{
$group: {
cost: { $sum: "$metrics.cost" },
likes: { $sum: "$metrics.likes" },
_id: {
year: { $year: { date: "$dimensions.date", timezone: "Europe/Amsterdam" } },
week: { $isoWeek: { date: "$dimensions.date", timezone: "Europe/Amsterdam" } }
}
}
},
{
$project: {
cost: 1,
likes: 1
}
}
],
{
cursor: {
batchSize: 50
},
allowDiskUse: true
}
);
This query takes about 25-30 seconds to complete and I'm looking to reduce this to at least 5-10 seconds. It's currently a single MongoDB node, no shards or anything. The explain query can be found here: https://pastebin.com/raw/fNnPrZh0 and executionStats here: https://pastebin.com/raw/WA7BNpgA As you can see, MongoDB is using indexes but there are still 1.3 million documents that need to be read. I currently suspect I'm facing some I/O bottlenecks.
Does anyone have an idea how I could improve this aggregation pipeline? Would sharding help at all? Is MonogDB the right tool here?
The following could improve performances if and only if precomputing dimensions within each record is an option.
If this type of query represents an important portion of the queries on this collection, then including additional fields to make these queries faster could be a viable alternative.
This hasn't been benchmarked.
One of the costly parts of this query probably comes from working with dates.
First during the $group stage while computing for each matching record the year and the iso week associated to a specific time zone.
Then, to a lesser extent, during the initial filtering, when keeping dates from the 3 last months.
The idea would be to store in each record the year and the isoweek, for the given example this would be { "year" : 2018, "week" : 10 }. This way the _id key in the $group stage wouldn't need any computation (which would otherwise represent 1M3 complex date operations).
In a similar fashion, we could also store in each record the associated month, which would be { "month" : "201803" } for the given example. This way the first match could be on months [2, 3, 4, 5] before applying a more precise and costlier filtering on the exact timestamps. This would spare the initial costlier Date filtering on 200M records to a simple Int filtering.
Let's create a new collection with these new pre-computed fields (in a real scenario, these fields would be included during the initial insert of the records):
db.large_collection.aggregate([
{ $addFields: {
"prec.year": { $year: { date: "$dimensions.date", timezone: "Europe/Amsterdam" } },
"prec.week": { $isoWeek: { date: "$dimensions.date", timezone: "Europe/Amsterdam" } },
"prec.month": { $dateToString: { format: "%Y%m", date: "$dimensions.date", timezone: "Europe/Amsterdam" } }
}},
{ "$out": "large_collection_precomputed" }
])
which will store these documents:
{
"dimensions" : { "account_id" : ObjectId("590889944befcf34204dbef2"), "url" : "https://test.com", "date" : ISODate("2018-03-04T23:00:00Z") },
"metrics" : { "cost" : 155, "likes" : 200 },
"prec" : { "year" : 2018, "week" : 10, "month" : "201803" }
}
And let's query:
db.large_collection_precomputed.aggregate([
// Initial gross filtering of dates (months) (on 200M documents):
{ $match: { "prec.month": { $gte: "201802", $lte: "201805" } } },
{ $match: {
"dimensions.account_id": { $in: [
ObjectId("590889944befcf34204dbf1f"), ObjectId("590889944befcf34204dbef2")
]}
}},
// Exact filtering of dates (costlier, but only on ~1M5 documents).
{ $match: { "dimensions.date": { $gte: new Date(1512082800000), $lte: new Date(1522447200000) } } },
{ $group: {
// The _id is now extremly fast to retrieve:
_id: { year: "$prec.year", "week": "$prec.week" },
cost: { $sum: "$metrics.cost" },
likes: { $sum: "$metrics.likes" }
}},
...
])
In this case we would use indexes on account_id and month.
Note: Here, months are stored as String ("201803") since I'm not sure how to cast them to Int within an aggregation query. But best would be to store them as Int when records are inserted
As a side effect, this obviously will make the storage disk/ram of the collection heavier.

How do I do a 'group by' for a datetime when I want to group by just the date using $group? [duplicate]

I am working on a project in which I am tracking number of clicks on a topic.
I am using mongodb and I have to group number of click by date( i want to group data for 15 days).
I am having data store in following format in mongodb
{
"_id" : ObjectId("4d663451d1e7242c4b68e000"),
"date" : "Mon Dec 27 2010 18:51:22 GMT+0000 (UTC)",
"topic" : "abc",
"time" : "18:51:22"
}
{
"_id" : ObjectId("4d6634514cb5cb2c4b69e000"),
"date" : "Mon Dec 27 2010 18:51:23 GMT+0000 (UTC)",
"topic" : "bce",
"time" : "18:51:23"
}
i want to group number of clicks on topic:abc by days(for 15 days)..i know how to group that but how can I group by date which are stored in my database
I am looking for result in following format
[
{
"date" : "date in log",
"click" : 9
},
{
"date" : "date in log",
"click" : 19
},
]
I have written code but it will work only if date are in string (code is here http://pastebin.com/2wm1n1ix)
...please guide me how do I group it
New answer using Mongo aggregation framework
After this question was asked and answered, 10gen released Mongodb version 2.2 with an aggregation framework, which is now the better way to do this sort of query. This query is a little challenging because you want to group by date and the values stored are timestamps, so you have to do something to convert the timestamps to dates that match. For the purposes of example I will just write a query that gets the right counts.
db.col.aggregate(
{ $group: { _id: { $dayOfYear: "$date"},
click: { $sum: 1 } } }
)
This will return something like:
[
{
"_id" : 144,
"click" : 165
},
{
"_id" : 275,
"click" : 12
}
]
You need to use $match to limit the query to the date range you are interested in and $project to rename _id to date. How you convert the day of year back to a date is left as an exercise for the reader. :-)
10gen has a handy SQL to Mongo Aggregation conversion chart worth bookmarking. There is also a specific article on date aggregation operators.
Getting a little fancier, you can use:
db.col.aggregate([
{ $group: {
_id: {
$add: [
{ $dayOfYear: "$date"},
{ $multiply:
[400, {$year: "$date"}]
}
]},
click: { $sum: 1 },
first: {$min: "$date"}
}
},
{ $sort: {_id: -1} },
{ $limit: 15 },
{ $project: { date: "$first", click: 1, _id: 0} }
])
which will get you the latest 15 days and return some datetime within each day in the date field. For example:
[
{
"click" : 431,
"date" : ISODate("2013-05-11T02:33:45.526Z")
},
{
"click" : 702,
"date" : ISODate("2013-05-08T02:11:00.503Z")
},
...
{
"click" : 814,
"date" : ISODate("2013-04-25T00:41:45.046Z")
}
]
There are already many answers to this question, but I wasn't happy with any of them. MongoDB has improved over the years, and there are now easier ways to do it. The answer by Jonas Tomanga gets it right, but is a bit too complex.
If you are using MongoDB 3.0 or later, here's how you can group by date. I start with the $match aggregation because the author also asked how to limit the results.
db.yourCollection.aggregate([
{ $match: { date: { $gte: ISODate("2019-05-01") } } },
{ $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date"} }, count: { $sum: 1 } } },
{ $sort: { _id: 1} }
])
To fetch data group by date in mongodb
db.getCollection('supportIssuesChat').aggregate([
{
$group : {
_id :{ $dateToString: { format: "%Y-%m-%d", date: "$createdAt"} },
list: { $push: "$$ROOT" },
count: { $sum: 1 }
}
}
])
Late answer, but for the record (for anyone else that comes to this page): You'll need to use the 'keyf' argument instead of 'key', since your key is actually going to be a function of the date on the event (i.e. the "day" extracted from the date) and not the date itself. This should do what you're looking for:
db.coll.group(
{
keyf: function(doc) {
var date = new Date(doc.date);
var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear()+'';
return {'day':dateKey};
},
cond: {topic:"abc"},
initial: {count:0},
reduce: function(obj, prev) {prev.count++;}
});
For more information, take a look at MongoDB's doc page on aggregation and group: http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group
This can help
return new Promise(function(resolve, reject) {
db.doc.aggregate(
[
{ $match: {} },
{ $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, count: { $sum: 1 } } },
{ $sort: { _id: 1 } }
]
).then(doc => {
/* if you need a date object */
doc.forEach(function(value, index) {
doc[index]._id = new Date(value._id);
}, this);
resolve(doc);
}).catch(reject);
}
Haven't worked that much with MongoDB yet, so I am not completely sure. But aren't you able to use full Javascript?
So you could parse your date with Javascript Date class, create your date for the day out of it and set as key into an "out" property. And always add one if the key already exists, otherwise create it new with value = 1 (first click). Below is your code with adapted reduce function (untested code!):
db.coll.group(
{
key:{'date':true},
initial: {retVal: {}},
reduce: function(doc, prev){
var date = new Date(doc.date);
var dateKey = date.getFullYear()+''+date.getMonth()+''+date.getDate();
(typeof prev.retVal[dateKey] != 'undefined') ? prev.retVal[dateKey] += 1 : prev.retVal[dateKey] = 1;
},
cond: {topic:"abc"}
}
)
thanks for #mindthief, your answer help solve my problem today. The function below can group by day a little more easier, hope can help the others.
/**
* group by day
* #param query document {key1:123,key2:456}
*/
var count_by_day = function(query){
return db.action.group(
{
keyf: function(doc) {
var date = new Date(doc.time);
var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear();
return {'date': dateKey};
},
cond:query,
initial: {count:0},
reduce: function(obj, prev) {
prev.count++;
}
});
}
count_by_day({this:'is',the:'query'})
Another late answer, but still. So if you wanna do it in only one iteration and get the number of clicks grouped by date and topic you can use the following code:
db.coll.group(
{
$keyf : function(doc) {
return { "date" : doc.date.getDate()+"/"+doc.date.getMonth()+"/"+doc.date.getFullYear(),
"topic": doc.topic };
},
initial: {count:0},
reduce: function(obj, prev) { prev.count++; }
})
Also If you would like to optimize the query as suggested you can use an integer value for date (hint: use valueOf(), for the key date instead of the String, though for my examples the speed was the same.
Furthermore it's always wise to check the MongoDB docs regularly, because they keep adding new features all the time. For example with the new Aggregation framework, which will be released in the 2.2 version you can achieve the same results much easier http://docs.mongodb.org/manual/applications/aggregation/
If You want a Date oject returned directly
Then instead of applying the Date Aggregation Operators, instead apply "Date Math" to round the date object. This can often be desirable as all drivers represent a BSON Date in a form that is commonly used for Date manipulation for all languages where that is possible:
db.datetest.aggregate([
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"click": { "$sum": 1 }
}}
])
Or if as is implied in the question that the grouping interval required is "buckets" of 15 days, then simply apply that to the numeric value in $mod:
db.datetest.aggregate([
{ "$group": {
"_id": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
1000 * 60 * 60 * 24 * 15
]}
]},
new Date(0)
]
},
"click": { "$sum": 1 }
}}
])
The basic math applied is that when you $subtract two Date objects the result returned will be the milliseconds of differnce numerically. So epoch is represented by Date(0) as the base for conversion in whatever language constructor you have.
With a numeric value, the "modulo" ( $mod ) is applied to round the date ( subtract the remainder from the division ) to the required interval. Being either:
1000 milliseconds x 60 seconds * 60 minutes * 24 hours = 1 day
Or
1000 milliseconds x 60 seconds * 60 minutes * 24 hours * 15 days = 15 days
So it's flexible to whatever interval you require.
By the same token from above an $add operation between a "numeric" value and a Date object will return a Date object equivalent to the millseconds value of both objects combined ( epoch is 0, therefore 0 plus difference is the converted date ).
Easily represented and reproducible in the following listing:
var now = new Date();
var bulk = db.datetest.initializeOrderedBulkOp();
for ( var x = 0; x < 60; x++ ) {
bulk.insert({ "date": new Date( now.valueOf() + ( 1000 * 60 * 60 * 24 * x ))});
}
bulk.execute();
And running the second example with 15 day intervals:
{ "_id" : ISODate("2016-04-14T00:00:00Z"), "click" : 12 }
{ "_id" : ISODate("2016-03-30T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-03-15T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-29T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-14T00:00:00Z"), "click" : 3 }
Or similar distribution depending on the current date when the listing is run, and of course the 15 day intervals will be consistent since the epoch date.
Using the "Math" method is a bit easier to tune, especially if you want to adjust time periods for different timezones in aggregation output where you can similarly numerically adjust by adding/subtracting the numeric difference from UTC.
Of course, that is a good solution. Aside from that you can group dates by days as strings (as that answer propose) or you can get the beginning of dates by projecting date field (in aggregation) like that:
{'$project': {
'start_of_day': {'$subtract': [
'$date',
{'$add': [
{'$multiply': [{'$hour': '$date'}, 3600000]},
{'$multiply': [{'$minute': '$date'}, 60000]},
{'$multiply': [{'$second': '$date'}, 1000]},
{'$millisecond': '$date'}
]}
]},
}}
It gives you this:
{
"start_of_day" : ISODate("2015-12-03T00:00:00.000Z")
},
{
"start_of_day" : ISODate("2015-12-04T00:00:00.000Z")
}
It has some pluses: you can manipulate with your days in date type (not number or string), it allows you to use all of the date aggregation operators in following aggregation operations and gives you date type on the output.

Storing and querying time intervals in MongoDB

Modeling shift planning app:
I came up with such data structure describing the shift.
{
"fromHour" : 7,
"fromMinute" : 30,
"toHour" : 9,
"toMinute" : 30,
"week" : 5,
"date" : "2015-01-26",
"user" : {
// ...
},
"_id" : ObjectId("54d0e4a82b9dc26c0c0f36e7")
}
The main thing that I need to store is the information:
1. when shift starts (hour and minute),
2. when shift ends (hour and minute) and
3. what date it's actually happening.
The fields fromHour, fromMinute, toHour, toMinute and date as ISO string worked pretty good for me for storing and querying the shifts by particular date.
The problem occurred when I needed to build reports out of it. Say, I want to get all shifts from "2015-01-01" to "2015-02-01" in range from 07:00, till 23:00.
I can add $and clause to my query, like
[ { fromHour: { '$gte': 7 } },
{ fromMinute: { '$gte': 0 } },
{ toHour: { '$lt': 23 } },
{ toMinute: { '$lt': 0 } } ]
But that doesn't work good, since for shift there toMinute is 30 the $lt will be false.
I'm trying to find efficient data structure that would allow to store timespans that would be easy to query.
Storing hours and minutes in two different fields separated is too error-prone and makes your job harder. Since Mongo does not have a distinct "Time" data type, only Date, and shifts usually start and end at "easy" times, I would recommend to implement something like converting the time to a real number in your application like this:
00:00 --> 0
01:00 --> 1
...
08:00 --> 8
08:15 --> 8.25
08:30 --> 8.5
...
16:30 --> 16.5
...
It is a bit of extra work in the app because you have to convert while saving or displaying but it's still better than having one single time value in two different fields.
So your data would look like this:
{
"shiftStart" : 7.5,
"shiftEnd" : 9.5,
"week" : 5,
"date" : "2015-01-26",
"user" : {
// ...
},
"_id" : ObjectId("54d0e4a82b9dc26c0c0f36e7")
}
and your query:
[ { shiftStart: { '$gte': 7 } },
{ shiftEnd: { '$lt': 23 } } ]
You can store the data with the date type in ISODate(...) format and then use $project and Date Aggregation Operators to query the data.
For your example:
db.shifts.aggregate([
{ $match: //matches the dates first to filter out before $project step
{ datetimeStart:
{ $gte: ISODate("2015-01-01T07:00:00.000Z"),
$lt: ISODate("2015-02-01T00:00:00.000Z")
},
datetimeEnd:
{ $gte: ISODate("2015-01-01T07:00:00.000Z"),
$lt: ISODate("2015-02-01T00:00:00.000Z")
}
}
},
{ $project: // $project step extracts the hours
{ otherNeededFields: 1, // any other fields you want to see
datetimeStart: 1,
datetimeEnd: 1,
hourStart: { $hour: "$datetimeStart" },
hourEnd: { $hour: "$datetimeEnd" }
}
},
{ $match: // match the shift hours
{ hourStart: { $gte: 7 },
hourEnd: { $lte: 23 }
}
}
])
With this system it would be possible, but complicated to find something more like shifts between 7:30AM and 10:30PM:
db.shifts.aggregate([
{ $match: //matches the dates first to filter out before $project step
{ datetimeStart:
{ $gte: ISODate("2015-01-01T07:30:00.000Z"),
$lt: ISODate("2015-02-01T00:00:00.000Z")
},
datetimeEnd:
{ $gte: ISODate("2015-01-01T07:30:00.000Z"),
$lt: ISODate("2015-02-01T00:00:00.000Z")
}
}
},
{ $project: // $project step extracts the hours
{ otherNeededFields: 1, // any other fields you want to see
datetimeStart: 1,
datetimeEnd: 1,
hourStart: { $hour: "$datetimeStart" },
minStart: { $minute: "$datetimeStart" },
hourEnd: { $hour: "$datetimeEnd" },
minEnd: { $minute: "$date
}
},
{ $match: // match the shift hours
{ $or:
[
{hourStart: 7, minStart: {$gte: 30}}, // hour is 7, minute >= 30
{hourStart: { $gte: 8 }} // hour is >= 8
],
$or:
[
{hourEnd: 22, minEnd: {$lte: 30}}, // hour is 22, minute <= 30
{hourEnd: { $lte: 21 }} // hour is <= 21
]
}
}
])

MongoDB - Query all documents createdAt within last hours, and group by minute?

From reading various articles out there, I believe this should be possible, but I'm not sure where exactly to start.
This is what I'm trying to do:
I want to run a query, where it finds all documents createAt within the last hour, and groups all of them by minute, and since each document has a tweet value, like 5, 6, or 19, add them up for each one of those minutes and provides a sum.
Here's a sample of the collection:
{
"createdAt": { "$date": 1385064947832 },
"updatedAt": null,
"tweets": 47,
"id": "06E72EBD-D6F4-42B6-B79B-DB700CCD4E3F",
"_id": "06E72EBD-D6F4-42B6-B79B-DB700CCD4E3F"
}
Is this possible to do in mongodb?
#zero323 - I first tried just grouping the last hour like so:
db.tweetdatas.group( {
key: { tweets: 1, 'createdAt': 1 },
cond: { createdAt: { $gt: new Date("2013-11-20T19:44:58.435Z"), $lt: new Date("2013-11-20T20:44:58.435Z") } },
reduce: function ( curr, result ) { },
initial: { }
} )
But that just returns all the tweets within the timeframe, which technically is what I want, but now I want to group them all by each minute, and add up the sum of tweets for each minute.
#almypal
Here is the query that I'm using, based off your suggestion:
db.tweetdatas.aggregate(
{$match:{ "createdAt":{$gt: "2013-11-22T14:59:18.748Z"}, }},
{$project: { "createdAt":1, "createdAt_Minutes": { $minute : "$createdAt" }, "tweets":1, }},
{$group:{ "_id":"$createdAt_Minutes", "sum_tweets":{$sum:"$tweets"} }}
)
However, it's displaying this response:
{ "result" : [ ], "ok" : 1 }
Update: The response from #almypal is working. Apparently, putting in the date like I have in the above example does not work. While I'm running this query from Node, in the shell, I thought it would be easier to convert the var date to a string, and use that in the shell.
Use aggregation as below:
var lastHour = new Date();
lastHour.setHours(lastHour.getHours()-1);
db.tweetdatas.aggregate(
{$match:{ "createdAt":{$gt: lastHour}, }},
{$project: { "createdAt":1, "createdAt_Minutes": { $minute : "$createdAt" }, "tweets":1, }},
{$group:{ "_id":"$createdAt_Minutes", "sum_tweets":{$sum:"$tweets"} }}
)
and the result would be like this
{
"result" : [
{
"_id" : 1,
"sum_tweets" : 117
},
{
"_id" : 2,
"sum_tweets" : 40
},
{
"_id" : 3,
"sum_tweets" : 73
}
],
"ok" : 1
}
where _id corresponds to the specific minute and sum_tweets is the total number of tweets in that minute.