This question already has answers here:
MongoDB SELECT COUNT GROUP BY
(9 answers)
Closed 5 years ago.
I'm trying to build an aggregation pipeline in Mongo to count the number of documents generated every 10 minutes in a fairly large dataset. Each document contains an ISODate in a field called requestDtsCal. I'm trying the following code (thanks to https://stackoverflow.com/users/3943271/wizard for the base code):
var baseDate = new Date(2017, 01, 11, 00, 00, 0);
var startDate = new Date(2017, 01, 11, 00, 00, 0);
var endDate = new Date(2018, 09, 20, 14, 25, 0);
var divisor = 10 * 60 * 1000; // 10 minutes in miliseconds
db.AUDIT.aggregate([
{
$match : {
requestDtsCal : {
$gte : startDate,
$lt : endDate
}
}
}, {
$group : {
_id : {
$subtract : [ "$requestDtsCal", {
$mod : [ {
$subtract : [ "$requestDtsCal", baseDate ]
}, divisor ]
} ]
},
dates : {
$push : "$requestDtsCal"
}
}
}, {
$count: "$requestDtsCal"
}
]).pretty();
If I run it without the last pipeline stage it returns an array of arrays of all the dates from each document within each range. As soon as I try and count the number of documents in each range with the last pipeline stage it fails with:
assert: command failed: {
"ok" : 0,
"errmsg" : "Unrecognized pipeline stage name: '$count'",
"code" : 16436
} : aggregate failed
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:16:14
assert.commandWorked#src/mongo/shell/assert.js:403:5
DB.prototype._runAggregate#src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1212:12
#(shell):1:1
2018-01-18T19:54:40.669-0800 E QUERY [thread1] Error: command failed: {
"ok" : 0,
"errmsg" : "Unrecognized pipeline stage name: '$count'",
"code" : 16436
} : aggregate failed :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:16:14
assert.commandWorked#src/mongo/shell/assert.js:403:5
DB.prototype._runAggregate#src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1212:12
#(shell):1:1
Any ideas what I'm doing wrong? This is running against Mongo 3.2.11 FWIW.
Thanks,
Ian
If version is 3.2 then you can use $sum in $group pipeline to get count
instead of
{ $count: "$requestDtsCal" }
use
{$group : {_id : null, count : {$sum : 1}}} // _id your ids
I figured out an easy way to do this. I wasn't understanding groups properly.
var baseDate = new Date(2017, 01, 11, 00, 00, 0);
var startDate = new Date(2017, 01, 11, 00, 00, 0);
var endDate = new Date(2018, 09, 20, 14, 25, 0);
var divisor = 10 * 60 * 1000; // 10 minutes in miliseconds
db.AUDIT.aggregate([
{
$match : {
requestDtsCal : {
$gte : startDate,
$lt : endDate
}
}
}, {
$group : {
_id : {
$subtract : [ "$requestDtsCal", {
$mod : [ {
$subtract : [ "$requestDtsCal", baseDate ]
}, divisor ]
} ]
},
count: {$sum: 1}
}
}
]).pretty();
Works properly.
Ian
Related
I am trying to get some avg number per month in the financial year. The collection is called test and the month data comes from CreateDate field. I want to get the avg price per month. The collection data is like below:
{
"_id" : ObjectId("5fd289a93f7cf02c36837ca7"),
"ClientName" : "John",
"OrderNumber" : "12345A",
"Price" : 10,
"CreateDate" : ISODate("2020-09-20T06:00:00.000Z"),
}
{
"_id" : ObjectId("5fd289a93f7cf02c36837cc7"),
"ClientName" : "John",
"OrderNumber" : "12345",
"Price" : 20,
"CreateDate" : ISODate("2020-09-12T06:00:00.000Z"),
}
So I am writing the query to get the avg number per month by the following within the financial year (from Sep to Aug):
db.test.aggregate([
{
$match: {
"CreateDate": {
$lt: ISODate("2021-08-31T00:00:00.000Z"),
$gte: ISODate("2020-09-01T00:00:00.000Z")
}
}
},
{
$group: {
_id: {$month: "$CreateDate"},
"AvgPrice": {
"$avg": "$Price",
}
}
},
{ $project:{ _id : 0 , Month: '$_id' , "AvgPrice ": '$AvgPrice' } }
])
The result I am getting is with the following format:
{
"Month" : 9,
"AvgPrice " : 15.0
}
{
"Month" : 10,
"AvgPrice " : 18.6666666666667
}
How can I display of the month converting to a string instead of the number. For example, the following is the ideal return:
{
"Month" : Sep,
"AvgPrice" : 15.0
}
{
"Month" : Oct,
"AvgPrice" : 18.6666666666667
}
I also have two more questions:
I am using the Mongodb 3.6 version, is there any way to round up the avg price to two digit after the decimal point? For example, above will be "18.67" instead of "18.66666". Mongo 4.2 has something called $round but 3.6 seems doesn't have this function.
If I want to break down by client, has the returning result like below:
{
"ClientName": "John",
"Month" : Sep,
"AvgPrice" : 15.0
}
{
"ClientName" : "Mary"
"Month" : Oct,
"AvgPrice" : 18.6666666666667
}
How do I put another level of the group to breakdown to the client level and then month level?
Any help will be appreciated!
If I want to break down by client
You can add ClientName field in _id,
{
$group: {
_id: {
ClientName: "$ClientName",
month: { $month: "$CreateDate" }
},
AvgPrice: { $avg: "$Price" }
}
},
How can I display of the month converting to a string instead of the number.
There is no any straight way to get month name in mongodb, but if you prepare array of months in string and access it by index,
$arrayElemAt to select month by its number
{
$project: {
_id: 0,
ClientName: "$_id.ClientName",
Month: {
$arrayElemAt: [
["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],
"$_id.month"
]
},
AvgPrice: 1
}
}
Playground
I am using the Mongodb 3.6 version, is there any way to round up the avg price to two digit after the decimal point?
There is no any option in mongodb 3.6 or below, you already know there is a option $round in mongodb 4.2.
You can refer this question Rounding to 2 decimal places using MongoDB aggregation framework
, there are many tricks.
I have a collection with documents like below:
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-05T00:00:00Z")},
{startDate: ISODate("2016-01-02T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-08T00:00:00Z")},
{startDate: ISODate("2016-01-05T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")},
{startDate: ISODate("2016-01-07T00:00:00Z"), endDate: ISODate("2016-01-10T00:00:00Z")}
I would like to return a record for every date between the minimum startDate and the maximum endDate. Along with each of these records I would like to return a count of the number of records where the startDate and endDate contain this date.
So for my above example the min startDate is 1/2/2016 and the max endDate is 1/10/2016 so I would like to return all dates between those two along with the counts. See desired output below:
{date: ISODate("2016-01-02T00:00:00Z"), count: 2}
{date: ISODate("2016-01-03T00:00:00Z"), count: 2}
{date: ISODate("2016-01-04T00:00:00Z"), count: 2}
{date: ISODate("2016-01-05T00:00:00Z"), count: 4}
{date: ISODate("2016-01-06T00:00:00Z"), count: 3}
{date: ISODate("2016-01-07T00:00:00Z"), count: 4}
{date: ISODate("2016-01-08T00:00:00Z"), count: 4}
{date: ISODate("2016-01-09T00:00:00Z"), count: 2}
{date: ISODate("2016-01-010T00:00:00Z"), count: 2}
Please let me know if this doesn't make sense and I can try to explain in more detail.
I am able to do this using a loop like below:
var startDate = ISODate("2016-01-02T00:00:00Z")
var endDate = ISODate("2016-02-10T00:00:00Z")
while(startDate < endDate){
var counts = db.data.find(
{
startDate: {$lte: startDate},
endDate: {$gte: startDate}
}
).count()
print(startDate, counts)
startDate.setDate(startDate.getDate() + 1)
}
But i'm wondering if there is a way to do this using the aggregation framework? I come from a mostly SQL background where looping to get data is often a bad idea. Does this same rule apply for MongoDB? Should I be concerned about using looping here and try to use the aggregation framework or is this a valid solution?
Your best bet here is mapReduce. This is because you can loop values in between "startDate" and "endDate" within each document and emit for each day ( or other required interval ) between those values. Then it is just a matter of accumulating per emitted date key from all data:
db.collection.mapReduce(
function() {
for( var d = this.startDate.valueOf(); d <= this.endDate.valueOf(); d += 1000 * 60 * 60 * 24 ) {
emit(new Date(d), 1)
}
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
This produces results like this:
{
"results" : [
{
"_id" : ISODate("2016-01-02T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-03T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-04T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-05T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-06T00:00:00Z"),
"value" : 3
},
{
"_id" : ISODate("2016-01-07T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-08T00:00:00Z"),
"value" : 4
},
{
"_id" : ISODate("2016-01-09T00:00:00Z"),
"value" : 2
},
{
"_id" : ISODate("2016-01-10T00:00:00Z"),
"value" : 2
}
],
"timeMillis" : 35,
"counts" : {
"input" : 5,
"emit" : 25,
"reduce" : 9,
"output" : 9
},
"ok" : 1
}
Your dates are rounded to a day in the sample, but if they were not in real data then it is just a simple matter of date math to be applied in order to round per interval.
In mongodb aggregate framework there are stages instead of loop. It is a pipeline and it goes through each stage until it reaches the last stage specified. That is why you see a [] when using aggregate framework. there are several stages, to name a few (match, group and project). Take a look at their document it is quite simple. anyways that was very brief. As for your question here is my proposition:
I have not tried this. If you can try this and let me know if it works:
First you only keep those with dates in the range you desire using $match. Then follow that with the $group stage.
Example:
db.collection.aggregate{
[
{$match: {
$and : [
{startDate: {$gte:ISODate("2016-01-02T00:00:00Z")},
{endDate: {$lte:ISODate("2016-02-10T00:00:00Z")}
]
},
{$group:
{_id: {startDate:"$startDate",endDate:"$endDate"},
count:{$sum:1}
}
}
]
}
If you want to just group using startDate as in you example replace
_id: {startDate:"$startDate",endDate:"$endDate"
with this:
_id: "$startDate"
I hope that helps
I'm trying to group by timestamp for the collection named "foo" { _id, TimeStamp }
db.foos.aggregate(
[
{$group : { _id : new Date (Date.UTC({ $year : '$TimeStamp' },{ $month : '$TimeStamp' },{$dayOfMonth : '$TimeStamp'})) }}
])
Expecting many dates but the result is just one date. The data i'm using is correct (has many foo and different dates except 1970). There's some problem in the date parsing but i can not solve yet.
{
"result" : [
{
"_id" : ISODate("1970-01-01T00:00:00.000Z")
}
],
"ok" : 1
}
Tried this One:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : new Date('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Result :
uncaught exception: aggregate failed: {
"errmsg" : "exception: disallowed field type Date in object expression (at 'parsedDate')",
"code" : 15992,
"ok" : 0
}
And that one:
db.foos.aggregate(
[
{$group : { _id : { year : { $year : '$TimeStamp' }, month : { $month : '$TimeStamp' }, day : {$dayOfMonth : '$TimeStamp'} }, count : { $sum : 1 } }},
{$project : { parsedDate : Date.UTC('$_id.year', '$_id.month', '$_id.day') , count : 1, _id : 0} }
])
Can not see dates in the result
{
"result" : [
{
"count" : 412
},
{
"count" : 1702
},
{
"count" : 422
}
],
"ok" : 1
}
db.foos.aggregate(
[
{ $project : { day : {$substr: ["$TimeStamp", 0, 10] }}},
{ $group : { _id : "$day", number : { $sum : 1 }}},
{ $sort : { _id : 1 }}
]
)
Group by date can be done in two steps in the aggregation framework, an additional third step is needed for sorting the result, if sorting is desired:
$project in combination with $substr takes the first 10 characters (YYYY:MM:DD) of the ISODate object from each document (the result is a collection of documents with the fields "_id" and "day");
$group groups by day, adding (summing) the number 1 for each matching document;
$sort ascending by "_id", which is the day from the previous aggregation step - this is optional if sorted result is desired.
This solution can not take advantage of indexes like db.twitter.ensureIndex( { TimeStamp: 1 } ), because it transforms the ISODate object to a string object on the fly. For large collections (millions of documents) this could be a performance bottleneck and more sophisticated approaches should be used.
It depends on whether you want to have the date as ISODate type in the final output. If so, then you can do one of two things:
Extract $year, $month, $dayOfMonth from your timestamp and then reconstruct a new date out of them (you are already trying to do that, but you're using syntax that doesn't work in aggregation framework).
If the original Timestamp is of type ISODate() then you can do date arithmetic to subtract the hours, minutes, seconds and milliseconds from your timestamp to get a new date that's "rounded" to the day.
There is an example of 2 here.
Here is how you would do 1. I'm making an assumption that all your dates are this year, but you can easily adjust the math to accommodate your oldest date.
project1={$project:{_id:0,
y:{$subtract:[{$year:"$TimeStamp"}, 2013]},
d:{$subtract:[{$dayOfYear:"$TimeStamp"},1]},
TimeStamp:1,
jan1:{$literal:new ISODate("2013-01-01T00:00:00")}
} };
project2={$project:{tsDate:{$add:[
"$jan1",
{$multiply:["$y", 365*24*60*60*1000]},
{$multiply:["$d", 24*60*60*1000]}
] } } };
Sample data:
db.foos.find({},{_id:0,TimeStamp:1})
{ "TimeStamp" : ISODate("2013-11-13T19:15:05.600Z") }
{ "TimeStamp" : ISODate("2014-02-01T10:00:00Z") }
Aggregation result:
> db.foos.aggregate(project1, project2)
{ "tsDate" : ISODate("2013-11-13T00:00:00Z") }
{ "tsDate" : ISODate("2014-02-01T00:00:00Z") }
This is what I use in one of my projects :
collection.aggregate(
// group results by date
{$group : {
_id : { date : "$date" }
// do whatever you want here, like $push, $sum...
}},
// _id is the date
{$sort : { _id : -1}},
{$orderby: { _id : -1 }})
.toArray()
Where $date is a Date object in mongo. I get results indexed by date.
My daily collection has documents like:
..
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "ED", "san" : 7046.25, "izm" : 1243.96 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "UA", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "CTA_TR", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "CAD", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "INT", "san" : 0, "izm" : 169.9 }
...
I left off _id field to spare the space here.
My task is to "fetch all documents within last 15 days". As you can see I need somehow to:
Get 15 unique dates. The newest one should be taken as the newest document in collection (what I mean that it isn't necessary the today's date, it's just the latest one in collection based on date field), and the oldest.. well, maybe it's not necessary to strictly define the oldest day in query, what I need is some kind of top15 starting from the newest day, if you know what I mean. Like 15 unique days.
db.daily.find() all documents, that have date field in that range of 15 days.
In the result, I should see all documents within 15 days starting from the newest in collection.
I just tested the following query against your data sample and it worked perfectly:
db.datecol.find(
{
"date":
{
$gte: new Date((new Date().getTime() - (15 * 24 * 60 * 60 * 1000)))
}
}
).sort({ "date": -1 })
Starting in Mongo 5, it's a nice use case for the $dateSubtract operator:
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-11-28") } <= older than 5 days
db.collection.aggregate([
{ $match: {
$expr: {
$gt: [
"$date",
{ $dateSubtract: { startDate: "$$NOW", unit: "day", amount: 5 } }
]
}
}}
])
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
With $dateSubtract, we create the oldest date after which we keep documents, by subtracting 5 (amount) "days" (unit) out of the current date $$NOW (startDate).
And you can obviously add a $sort stage to sort documents by date.
You need to run the distinct command to get all the unique dates. Below is the example. The "values" array has all the unique dates of the collection from which you need to retrieve the most recent 15 days on the client side
db.runCommand ( { distinct: 'datecol', key: 'date' } )
{
"values" : [
ISODate("2013-01-03T00:00:00Z"),
ISODate("2013-01-04T00:00:00Z")
],
"stats" : {
"n" : 2,
"nscanned" : 2,
"nscannedObjects" : 2,
"timems" : 0,
"cursor" : "BasicCursor"
},
"ok" : 1
}
You then use the $in operator with the most recent 15 dates from step 1. Below is an example that finds all documents that belong to one of the mentioned two dates.
db.datecol.find({
"date":{
"$in":[
new ISODate("2013-01-03T00:00:00Z"),
new ISODate("2013-01-04T00:00:00Z")
]
}
})
I'm new to MongoDb and have a job for (I suppose) MapReduce or Aggregation.
I have an "invoices" collection with documents in this format:
{
date: 'some unix timestamp',
total: 12345,
paid: true
}
I need to display a table with months (jan-dec) as columns, a row for each year and the sum of total in the month (divided in paid and unpaid) in the cell. Like this:
| Jan | Feb | ...
2013 | 1,222 / 200 | 175 / 2,122 | ...
...
Can you help me get the mongo command right?
Maybe I'm better off writing some JS code to execute in mongo?
I've now found a solution using MapReduce. Here it is in use from PHP:
$map = new MongoCode('
function() {
var d = new Date(this.date*1000);
emit({y: d.getFullYear(), m: d.getMonth()}, {
total: this.total,
notPaid: this.paid ? 0 : this.total,
count: 1
});
};
');
$reduce = new MongoCode('
function(month, values) {
result = { total: 0, notPaid: 0, count: 0 };
for (var i = 0; i < values.length; i++) {
result.total += values[i].total;
result.notPaid += values[i].notPaid;
result.count += values[i].count;
}
return result;
};
');
$result = $db->command(array(
'mapreduce' => 'invoices',
'map' => $map,
'reduce' => $reduce,
'out' => 'temp'
));
echo $result['timeMillis'];
Now the results are in the "temp" collection, one document per month. Could it be optimized or enhanced?
You can do this with aggregation framework like this:
db.invoices.aggregate( [
{
"$project" : {
"yr" : {
"$year" : "$date"
},
"mo" : {
"$month" : "$date"
},
"total" : 1,
"unpaid" : {
"$cond" : [
"$paid",
0,
"$total"
]
}
}
},
{
"$group" : {
"_id" : {
"y" : "$yr",
"m" : "$mo"
},
"total" : {
"$sum" : "$total"
},
"unpaid" : {
"$sum" : "$unpaid"
}
}
}
] )
You can use another $project at the end to pretty-up the output, and a $sort to order it, but that's the basic functioning core of it.