I have an aggregate query that returns the count of records a property has.
db.collection.aggregate([
{
$group : {
_id : "$propertyId",
count: { $sum: 1 }
}
},
{
$sort : { count: 1 }
}
],
{
allowDiskUse:true
});
This gives me a result that looks like this.
{ "_id" : 1234, "count" : 1 }
{ "_id" : 1235, "count" : 1 }
{ "_id" : 1236, "count" : 2 }
{ "_id" : 1237, "count" : 3 }
{ "_id" : 1238, "count" : 3 }
Now I want to count the counts. So the above result would turn into this.
{ "_id" : 1, "count" : 2 }
{ "_id" : 2, "count" : 1 }
{ "_id" : 3, "count" : 2 }
Is this possible to do with a query, or do I need to write some code to get this done?
I updated the query to have another "step" that counts the counts. This is how it looks.
db.collection.aggregate([
{
$group : {
_id : "$propertyId",
count: { $sum: 1 }
}
},
{
$group : {
_id : "$count",
countOfCounts: { $sum: 1 }
}
},
{
$sort : { countOfCounts: 1 }
}
],
{
allowDiskUse:true
});
I have following query on a list with this fields : key,time,p,email
use app_db;
db.getCollection("app_log").aggregate(
[
{
"$match" : {
"key" : "login"
}
},
{
"$group" : {
"_id" : {
"$substr" : [
"$time",
0.0,
10.0
]
},
"total" : {
"$sum" : "$p"
},
"count" : {
"$sum" : 1.0
}
}
}
]
);
and the output is something like this :
{
"_id" : "2019-08-25",
"total" : NumberInt(623),
"count" : 400.0
}
{
"_id" : "2019-08-24",
"total" : NumberInt(2195),
"count" : 1963.0
}
{
"_id" : "2019-08-23",
"total" : NumberInt(1294),
"count" : 1706.0
}
{
"_id" : "2019-08-22",
"total" : NumberInt(53),
"count" : 1302.0
}
But I need the count to be distinctive on email field, which is count number of distinct email addresses who logged in per day and their p value is greater 0
You need $addToSet to get an array of unique email values per day and then you can use $size to count the number of items in that array:
db.getCollection("app_log").aggregate(
[
{
"$match" : {
"key" : "login"
}
},
{
"$group" : {
"_id" : {
"$substr" : [
"$time",
0.0,
10.0
]
},
"total" : {
"$sum" : "$p"
},
"emails" : {
"$addToSet": "$email"
}
}
},
{
$project: {
_id: 1,
total: 1,
countDistinct: { $size: "$emails" }
}
}
]
);
I have the following (already aggregated)collection
{ "_id" : { "day" : "2015-02-01" }, "total" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "total" : 3 }
{ "_id" : { "day" : "2015-02-03" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-04" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-05" }, "total" : 5 }
What i need is calculating an absolute value for each day, summing the previous days. So expected result would be in the case above
{ "_id" : { "day" : "2015-02-01" }, "absolutetotalforday" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "absolutetotalforday" : 5 }
{ "_id" : { "day" : "2015-02-03" }, "absolutetotalforday" : 15 }
{ "_id" : { "day" : "2015-02-04" }, "absolutetotalforday" : 25 }
{ "_id" : { "day" : "2015-02-05" }, "absolutetotalforday" : 30 }
Currently no clue how to achieve this with 1 query. Of course i could do a sum for each day I'm interested in, but this might be a long time range
Any help appreciated
Because aggregation framework has no mechanism of knowing the value of a previous document, or the previous "grouped" value of a document, your best bet would be to use Map-Reduce in this case.
Map-Reduce will give you the "running total" for the current total at end of day you require although this won't be in the desired key, absolutetotalforday but in a key called value since the reduced values are always invalue key.
The following mapReduce() operation will give you the desired result, assuming the results from the previous aggregation operation were output to a separate collection named agg_results:
db.agg_results.mapReduce(
function() { emit( this._id, this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": { "inline": 1 }
}
);
Sample Output
{
"results" : [
{
"_id" : {
"day" : "2015-02-01"
},
"value" : 2
},
{
"_id" : {
"day" : "2015-02-02"
},
"value" : 5
},
{
"_id" : {
"day" : "2015-02-03"
},
"value" : 15
},
{
"_id" : {
"day" : "2015-02-04"
},
"value" : 25
},
{
"_id" : {
"day" : "2015-02-05"
},
"value" : 30
}
],
"timeMillis" : 0,
"counts" : {
"input" : 5,
"emit" : 5,
"reduce" : 0,
"output" : 5
},
"ok" : 1
}
Sorting the results will not work with inline results and with dates of String type. Instead, try converting the date strings to a JavaScript date object, write the results to a collection and then run a sort on that collection:
db.agg_results.mapReduce(
function() { emit( new Date(this._id.day), this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": "tmpResults"
}
);
Sample Output (with sort)
> db.tmpResults.find().sort({_id: 1})
{ "_id" : ISODate("2015-02-01T00:00:00Z"), "value" : 2 }
{ "_id" : ISODate("2015-02-02T00:00:00Z"), "value" : 5 }
{ "_id" : ISODate("2015-02-03T00:00:00Z"), "value" : 15 }
{ "_id" : ISODate("2015-02-04T00:00:00Z"), "value" : 25 }
{ "_id" : ISODate("2015-02-05T00:00:00Z"), "value" : 30 }
>
This question already has answers here:
Fill missing dates in records
(5 answers)
Closed 3 years ago.
I have a product collection with the following documents:
{ "_id" : 1, "item" : "abc", created: ISODate("2014-10-01T08:12:00Z") }
{ "_id" : 2, "item" : "jkl", created: ISODate("2014-10-02T09:13:00Z") }
{ "_id" : 3, "item" : "hjk", created: ISODate("2014-10-02T09:18:00Z") }
{ "_id" : 4, "item" : "sdf", created: ISODate("2014-10-07T09:14:00Z") }
{ "_id" : 5, "item" : "xyz", created: ISODate("2014-10-15T09:15:00Z") }
{ "_id" : 6, "item" : "iop", created: ISODate("2014-10-16T09:15:00Z") }
I want to draw a chart describing product count by day, so I use mongodb aggregation framework to count product group by day:
var proj1 = {
"$project": {
"created": 1,
"_id": 0,
"h": {"$hour": "$created"},
"m": {"$minute": "$created"},
"s": {"$second": "$created"},
"ml": {"$millisecond": "$created"}
}
};
var proj2 = {
"$project": {
"created": {
"$subtract": [
"$created", {
"$add": [
"$ml",
{"$multiply": ["$s", 1000]},
{"$multiply": ["$m", 60, 1000]},
{"$multiply": ["$h", 60, 60, 1000]}
]
}]
}
}
};
db.product.aggregate([
proj1,
proj2,
{$group: {
_id: "$created",
count: {$sum: 1}
}},
{$sort: {_id: 1}}
])
The result in mongo shell is:
{
"result" : [
{
"_id" : ISODate("2014-10-01T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-10-02T00:00:00.000Z"),
"count" : 2
},
{
"_id" : ISODate("2014-10-07T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-10-15T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-10-16T00:00:00.000Z"),
"count" : 1
}
],
"ok" : 1
}
Of course, there is no product some days and the chart using the result set above looks like this:
But the desired chart should look like this:
So the question is: How can I add missing days (of the last 30 days, for example) to the result set with count = 0? That means, the desired result set should looks like this:
{
"result" : [
{
"_id" : ISODate("2014-09-16T00:00:00.000Z"),
"count" : 0
},
{
"_id" : ISODate("2014-09-17T00:00:00.000Z"),
"count" : 0
},
...
{
"_id" : ISODate("2014-10-01T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-10-02T00:00:00.000Z"),
"count" : 2
},
{
"_id" : ISODate("2014-10-03T00:00:00.000Z"),
"count" : 0
},
...
{
"_id" : ISODate("2014-10-07T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-09-08T00:00:00.000Z"),
"count" : 0
},
...
{
"_id" : ISODate("2014-10-15T00:00:00.000Z"),
"count" : 1
},
{
"_id" : ISODate("2014-10-16T00:00:00.000Z"),
"count" : 1
},
// also, add some extra days
{
"_id" : ISODate("2014-10-17T00:00:00.000Z"),
"count" : 0
},
{
"_id" : ISODate("2014-10-10T00:00:00.000Z"),
"count" : 0
}
],
"ok" : 1
}
Using aggregate to handle this question completely is a pain.
But it can be reached.
(MongoDB V2.6+ required)
var proj1 = {
"$project" : {
"created" : 1,
"_id" : 0,
"h" : {
"$hour" : "$created"
},
"m" : {
"$minute" : "$created"
},
"s" : {
"$second" : "$created"
},
"ml" : {
"$millisecond" : "$created"
}
}
};
var proj2 = {
"$project" : {
"created" : {
"$subtract" : [ "$created", {
"$add" : [ "$ml", {
"$multiply" : [ "$s", 1000 ]
}, {
"$multiply" : [ "$m", 60, 1000 ]
}, {
"$multiply" : [ "$h", 60, 60, 1000 ]
} ]
} ]
}
}
};
var group1 = {
$group : {
_id : "$created",
count : {
$sum : 1
}
}
};
var group2 = {
$group : {
_id : 0,
origin : {
$push : "$$ROOT"
},
maxDate : {
$max : "$_id"
}
}
};
var step = 24 * 60 * 60 * 1000; // milliseconds of one day
var project3 = {
$project : {
origin : 1,
extents : {
$map : {
"input" : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29],
"as" : "e",
"in" : {
_id : {
$subtract : [ "$maxDate", {
$multiply : [ step, "$$e"]
}]
},
count : {
$add : [0]
}
}
}
}
}
};
var project4 = {
$project : {
_id : 0,
values : {
$setUnion : [ "$origin", "$extents"]
}
}
};
var unwind1 = {
$unwind : "$values"
};
var group3 = {
$group : {
_id : "$values._id",
count : {
$max : "$values.count"
}
}
};
db.product.aggregate([ proj1, proj2, group1, group2, project3, project4,
unwind1, group3, {
$sort : {
_id : 1
}
} ]);
I would like to fill the missing part at application end something like this for your reference:
function sortResult(x, y) {
var t1 = x._id.getTime();
var t2 = y._id.getTime();
if (t1 < t2) {
return -1;
} else if (t1 == t2) {
return 0;
} else {
return 1;
}
}
var result = db.product.aggregate();
var endDateMilliseconds = result[result.length - 1]._id.getTime();
var step = 24 * 60 * 60 * 1000; // milliseconds of one day
var map = {};
for (var i in result) {
map[ result[i]._id.getTime() ] = result[i];
}
for (var ms = endDateMilliseconds, x = 1; x < 30; x++) {
ms -= step;
if ( ! ( ms in map ) ) {
map[ms] = {_id : new Date(ms), count : 0};
}
}
var finalResult = [];
for (var x in map) {
finalResult.push(map[x]);
}
finalResult.sort(sortResult);
printjson(finalResult);
Ok, first of all: Non-existing values are evaluated to null (roughly translates to "nada", "nothing", "not there"), which isn't equal to 0, which is a well defined value.
MongoDB has no semantical understanding of the difference between 0 and 42, for example. So how should MongoDB decide which value to assume for a day in the time (of which mongo has no semantical understanding, too)?
Basically, you have two choices: save a 0 for each day when no value is to record or you iterate in your app over the days in the time you want to create a chart for and issue 0 for each day no value exists as a substitute. Id' suggest doing the former, since that would make it possible using the aggregation framework.
I am fairly new to MongoDB and I am playing with the aggregate framework. One of the examples from the documentation shows the following, which returns total number of new user joins per month and lists the month joined:
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The code outputs the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Is it possible to also have each object contain the sum of all users that have joined since the start, so I don't have to run over the objects programmatically and calculate it myself?
Example desired output:
{
"_id" : {
"month_joined" : 1
},
"number" : 3,
"total": 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9,
"total": 12
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5,
"total": 17
}