If I have data in the following format:
[
{
_id: 1,
startDate: ISODate("2017-01-1T00:00:00.000Z"),
endDate: ISODate("2017-02-25T00:00:00.000Z"),
type: 'CAR'
},
{
_id: 2,
startDate: ISODate("2017-02-17T00:00:00.000Z"),
endDate: ISODate("2017-03-22T00:00:00.000Z"),
type: 'HGV'
}
]
Is it possible to retrieve data grouped by 'type', but also with a count of the type for each of month in a given date range e.g. between 2017/1/1 to 2017/4/1 would return:
[
{
_id: 'CAR',
monthCounts: [
/*January*/
{
from: ISODate("2017-01-1T00:00:00.000Z"),
to: ISODate("2017-01-31T23:59:59.999Z"),
count: 1
},
/*February*/
{
from: ISODate("2017-02-1T00:00:00.000Z"),
to: ISODate("2017-02-28T23:59:59.999Z"),
count: 1
},
/*March*/
{
from: ISODate("2017-03-1T00:00:00.000Z"),
to: ISODate("2017-03-31T23:59:59.999Z"),
count: 0
},
]
},
{
_id: 'HGV',
monthCounts: [
{
from: ISODate("2017-01-1T00:00:00.000Z"),
to: ISODate("2017-01-31T23:59:59.999Z"),
count: 0
},
{
from: ISODate("2017-02-1T00:00:00.000Z"),
to: ISODate("2017-02-28T23:59:59.999Z"),
count: 1
},
{
from: ISODate("2017-03-1T00:00:00.000Z"),
to: ISODate("2017-03-31T23:59:59.999Z"),
count: 1
},
]
}
]
The returned format is not really important, but what I am trying to achieve is in a single query to retrieve a number of counts for the same grouping (one per month). The input could be simply a start and end date to report from or more likely it could be an array of the date ranges to group by.
The algorithm for this is to basically "iterate" values between the interval of the two values. MongoDB has a couple of ways to deal with this, being what has always been present with mapReduce() and with new features available to the aggregate() method.
I'm going expand on your selection to deliberately show an overlapping month since your examples did not have one. This will result in the "HGV" values appearing in "three" months of output.
{
"_id" : 1,
"startDate" : ISODate("2017-01-01T00:00:00Z"),
"endDate" : ISODate("2017-02-25T00:00:00Z"),
"type" : "CAR"
}
{
"_id" : 2,
"startDate" : ISODate("2017-02-17T00:00:00Z"),
"endDate" : ISODate("2017-03-22T00:00:00Z"),
"type" : "HGV"
}
{
"_id" : 3,
"startDate" : ISODate("2017-02-17T00:00:00Z"),
"endDate" : ISODate("2017-04-22T00:00:00Z"),
"type" : "HGV"
}
Aggregate - Requires MongoDB 3.4
db.cars.aggregate([
{ "$addFields": {
"range": {
"$reduce": {
"input": { "$map": {
"input": { "$range": [
{ "$trunc": {
"$divide": [
{ "$subtract": [ "$startDate", new Date(0) ] },
1000
]
}},
{ "$trunc": {
"$divide": [
{ "$subtract": [ "$endDate", new Date(0) ] },
1000
]
}},
60 * 60 * 24
]},
"as": "el",
"in": {
"$let": {
"vars": {
"date": {
"$add": [
{ "$multiply": [ "$$el", 1000 ] },
new Date(0)
]
},
"month": {
}
},
"in": {
"$add": [
{ "$multiply": [ { "$year": "$$date" }, 100 ] },
{ "$month": "$$date" }
]
}
}
}
}},
"initialValue": [],
"in": {
"$cond": {
"if": { "$in": [ "$$this", "$$value" ] },
"then": "$$value",
"else": { "$concatArrays": [ "$$value", ["$$this"] ] }
}
}
}
}
}},
{ "$unwind": "$range" },
{ "$group": {
"_id": {
"type": "$type",
"month": "$range"
},
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.type",
"monthCounts": {
"$push": { "month": "$_id.month", "count": "$count" }
}
}}
])
The key to making this work is the $range operator which takes values for a "start" and and "end" as well as an "interval" to apply. The result is an array of values taken from the "start" and incremented until the "end" is reached.
We use this with startDate and endDate to generate the possible dates in between those values. You will note that we need to do some math here since the $range only takes a 32-bit integer, but we can take the milliseconds away from the timestamp values so that is okay.
Because we want "months", the operations applied extract the month and year values from the generated range. We actually generate the range as the "days" in between since "months" are difficult to deal with in math. The subsequent $reduce operation takes only the "distinct months" from the date range.
The result therefore of the first aggregation pipeline stage is a new field in the document which is an "array" of all the distinct months covered between startDate and endDate. This gives an "iterator" for the rest of the operation.
By "iterator" I mean than when we apply $unwind we get a copy of the original document for every distinct month covered in the interval. This then allows the following two $group stages to first apply a grouping to the common key of "month" and "type" in order to "total" the counts via $sum, and next $group makes the key just the "type" and puts the results in an array via $push.
This gives the result on the above data:
{
"_id" : "HGV",
"monthCounts" : [
{
"month" : 201702,
"count" : 2
},
{
"month" : 201703,
"count" : 2
},
{
"month" : 201704,
"count" : 1
}
]
}
{
"_id" : "CAR",
"monthCounts" : [
{
"month" : 201701,
"count" : 1
},
{
"month" : 201702,
"count" : 1
}
]
}
Note that the coverage of "months" is only present where there is actual data. Whilst possible to produce zero values over a range, it requires quite a bit of wrangling to do so and is not very practical. If you want zero values then it is better to add that in post processing in the client once the results have been retrieved.
If you really have your heart set on the zero values, then you should separately query for $min and $max values, and pass these in to "brute force" the pipeline into generating the copies for each supplied possible range value.
So this time the "range" is made externally to all documents, and you then use a $cond statement into the accumulator to see if the current data is within the grouped range produced. Also since the generation is "external", we really don't need the MongoDB 3.4 operator of $range, so this can be applied to earlier versions as well:
// Get min and max separately
var ranges = db.cars.aggregate(
{ "$group": {
"_id": null,
"startRange": { "$min": "$startDate" },
"endRange": { "$max": "$endDate" }
}}
).toArray()[0]
// Make the range array externally from all possible values
var range = [];
for ( var d = new Date(ranges.startRange.valueOf()); d <= ranges.endRange; d.setUTCMonth(d.getUTCMonth()+1)) {
var v = ( d.getUTCFullYear() * 100 ) + d.getUTCMonth()+1;
range.push(v);
}
// Run conditional aggregation
db.cars.aggregate([
{ "$addFields": { "range": range } },
{ "$unwind": "$range" },
{ "$group": {
"_id": {
"type": "$type",
"month": "$range"
},
"count": {
"$sum": {
"$cond": {
"if": {
"$and": [
{ "$gte": [
"$range",
{ "$add": [
{ "$multiply": [ { "$year": "$startDate" }, 100 ] },
{ "$month": "$startDate" }
]}
]},
{ "$lte": [
"$range",
{ "$add": [
{ "$multiply": [ { "$year": "$endDate" }, 100 ] },
{ "$month": "$endDate" }
]}
]}
]
},
"then": 1,
"else": 0
}
}
}
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.type",
"monthCounts": {
"$push": { "month": "$_id.month", "count": "$count" }
}
}}
])
Which produces the consistent zero fills for all possible months on all groupings:
{
"_id" : "HGV",
"monthCounts" : [
{
"month" : 201701,
"count" : 0
},
{
"month" : 201702,
"count" : 2
},
{
"month" : 201703,
"count" : 2
},
{
"month" : 201704,
"count" : 1
}
]
}
{
"_id" : "CAR",
"monthCounts" : [
{
"month" : 201701,
"count" : 1
},
{
"month" : 201702,
"count" : 1
},
{
"month" : 201703,
"count" : 0
},
{
"month" : 201704,
"count" : 0
}
]
}
MapReduce
All versions of MongoDB support mapReduce, and the simple case of the "iterator" as mentioned above is handled by a for loop in the mapper. We can get output as generated up to the first $group from above by simply doing:
db.cars.mapReduce(
function () {
for ( var d = this.startDate; d <= this.endDate;
d.setUTCMonth(d.getUTCMonth()+1) )
{
var m = new Date(0);
m.setUTCFullYear(d.getUTCFullYear());
m.setUTCMonth(d.getUTCMonth());
emit({ id: this.type, date: m},1);
}
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Which produces:
{
"_id" : {
"id" : "CAR",
"date" : ISODate("2017-01-01T00:00:00Z")
},
"value" : 1
},
{
"_id" : {
"id" : "CAR",
"date" : ISODate("2017-02-01T00:00:00Z")
},
"value" : 1
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-02-01T00:00:00Z")
},
"value" : 2
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-03-01T00:00:00Z")
},
"value" : 2
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-04-01T00:00:00Z")
},
"value" : 1
}
So it does not have the second grouping to compound to arrays, but we did produce the same basic aggregated output.
Related
I have documents stored into MongoDB like this :
{
"_id" : "XBpNKbdGSgGfnC2MJ",
"po" : 72134185,
"machine" : 40940,
"location" : "02A01",
"inDate" : ISODate("2017-07-19T06:10:13.059Z"),
"requestDate" : ISODate("2017-07-19T06:17:04.901Z"),
"outDate" : ISODate("2017-07-19T06:30:34Z")
}
And I want give the sum, by day, of inDate and outDate.
I can retrieve of both side the sum of documents by inDate day and, on other side, the sum of documents by outDate, but I would like the sum of each.
Currently, I use this pipeline :
$group: {
_id: {
yearA: { $year: '$inDate' },
monthA: { $month: '$inDate' },
dayA: { $dayOfMonth: '$inDate' },
},
count: { $sum: 1 },
},
and I give :
{ "_id" : { "year" : 2017, "month" : 7, "day" : 24 }, "count" : 1 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 21 }, "count" : 11 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 19 }, "count" : 20 }
But I would like, if it's possible :
{ "_id" : { "year" : 2017, "month" : 7, "day" : 24 }, "countIn" : 1, "countOut" : 4 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 21 }, "countIn" : 11, "countOut" : 23 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 19 }, "countIn" : 20, "countOut" : 18 }
Any idea ?
Many thanks :-)
You can also split the documents at the source, by essentially combining each value into an array of entries by "type" for "in" and "out". You can do this simply using $map and $cond to select the fields, then $unwind the array and then determine which field to "count" again by inspecting with $cond:
collection.aggregate([
{ "$project": {
"dates": {
"$filter": {
"input": {
"$map": {
"input": [ "in", "out" ],
"as": "type",
"in": {
"type": "$$type",
"date": {
"$cond": {
"if": { "$eq": [ "$$type", "in" ] },
"then": "$inDate",
"else": "$outDate"
}
}
}
}
},
"as": "dates",
"cond": { "$ne": [ "$$dates.date", null ] }
}
}
}},
{ "$unwind": "$dates" },
{ "$group": {
"_id": {
"year": { "$year": "$dates.date" },
"month": { "$month": "$dates.date" },
"day": { "$dayOfMonth": "$dates.date" }
},
"countIn": {
"$sum": {
"$cond": {
"if": { "$eq": [ "$dates.type", "in" ] },
"then": 1,
"else": 0
}
}
},
"countOut": {
"$sum": {
"$cond": {
"if": { "$eq": [ "$dates.type", "out" ] },
"then": 1,
"else": 0
}
}
}
}}
])
That's a safe way to do this that does not risk breaking the BSON limit, no matter what size of data you send at it.
Personally I would rather run as separate processes and "combine" the aggregated results separately, but that would depend on the environment you are running in, which is not mentioned in the question.
For an example of "parallel" execution, you can structure in Meteor somewhere along these lines:
import { Meteor } from 'meteor/meteor';
import { Source } from '../imports/source';
import { Target } from '../imports/target';
Meteor.startup(async () => {
// code to run on server at startup
await Source.remove({});
await Target.remove({});
console.log('Removed');
Source.insert({
"_id" : "XBpNKbdGSgGfnC2MJ",
"po" : 72134185,
"machine" : 40940,
"location" : "02A01",
"inDate" : new Date("2017-07-19T06:10:13.059Z"),
"requestDate" : new Date("2017-07-19T06:17:04.901Z"),
"outDate" : new Date("2017-07-19T06:30:34Z")
});
console.log('Inserted');
await Promise.all(
["In","Out"].map( f => new Promise((resolve,reject) => {
let cursor = Source.rawCollection().aggregate([
{ "$match": { [`${f.toLowerCase()}Date`]: { "$exists": true } } },
{ "$group": {
"_id": {
"year": { "$year": `$${f.toLowerCase()}Date` },
"month": { "$month": `$${f.toLowerCase()}Date` },
"day": { "$dayOfYear": `$${f.toLowerCase()}Date` }
},
[`count${f}`]: { "$sum": 1 }
}}
]);
cursor.on('data', async (data) => {
cursor.pause();
data.date = data._id;
delete data._id;
await Target.upsert(
{ date: data.date },
{ "$set": data }
);
cursor.resume();
});
cursor.on('end', () => resolve('done'));
cursor.on('error', (err) => reject(err));
}))
);
console.log('Mapped');
let targets = await Target.find().fetch();
console.log(targets);
});
Which is essentially going to output to the target collection as was mentioned in comments like:
{
"_id" : "XdPGMkY24AcvTnKq7",
"date" : {
"year" : 2017,
"month" : 7,
"day" : 200
},
"countIn" : 1,
"countOut" : 1
}
Riiiight. I came up with the following query. Admittedly, I have seen simpler and nicer ones in my life but it certainly gets the job done:
db.getCollection('test').aggregate
(
{
$facet: // split aggregation into two pipelines
{
"in": [
{ "$match": { "inDate": { "$ne": null } } }, // get rid of null values
{ $group: { "_id": { "y": { "$year": "$inDate" }, "m": { "$month": "$inDate" }, "d": { "$dayOfMonth": "$inDate" } }, "cIn": { $sum : 1 } } }, // compute sum per inDate
],
"out": [
{ "$match": { "outDate": { "$ne": null } } }, // get rid of null values
{ $group: { "_id": { "y": { "$year": "$outDate" }, "m": { "$month": "$outDate" }, "d": { "$dayOfMonth": "$outDate" } }, "cOut": { $sum : 1 } } }, // compute sum per outDate
]
}
},
{ $project: { "result": { $setUnion: [ "$in", "$out" ] } } }, // merge results into new array
{ $unwind: "$result" }, // unwind array into individual documents
{ $replaceRoot: { newRoot: "$result" } }, // get rid of the additional field level
{ $group: { _id: { year: "$_id.y", "month": "$_id.m", "day": "$_id.d" }, "countIn": { $sum: "$cIn" }, "countOut": { $sum: "$cOut" } } } // group into final result
)
As always with MongoDB aggregations you can get an idea of what's going on by simply reducing the projection stages step by step starting from the end of the query.
EDIT:
As you can see in the comments below there was a bit of a discussion around document size limits and the general applicability of this solution.
So let's look at those aspects in greater detail and let's also compare the performance of the $facet based solution to the one based on $map (suggested by #NeilLunn to avoid potential document size issues).
I created 2 million test records that have random dates assigned to both the "inDate" and the "outDate" field:
{
"_id" : ObjectId("597857e0fa37b3f66959571a"),
"inDate" : ISODate("2016-07-29T22:00:00.000Z"),
"outDate" : ISODate("1988-07-14T22:00:00.000Z")
}
The data range covered was from 01.01.1970 all the way to 01.01.2050, that's a total of 29220 distinct days. Given the random distribution of the 2 million test records across this time range both queries can be expected to return the full 29220 possible results (which both did).
Then I ran both queries five times after restarting my single MongoDB instance freshly and the results in milliseconds I got looked like this:
$facet: 5663, 5400, 5380, 5460, 5520
$map: 9648, 9134, 9058, 9085, 9132
I also measured the size of the single document returned by the facet stage which was 3.19MB so reasonably far away from the MongoDB document size limit (16MB at the time of writing) which, however, only applies to the result document anyway and wouldn't be a problem during pipeline processing.
Bottom line: If you want performance, use the solution suggested here. Be careful about the document size limit, though, in particular if your use case is not the exact one described in the question above (e.g. when you need to collect even more/bigger data). Also, I am not sure if in a sharded scenario both solutions still expose the same performance characteristics...
I required records with the output of gender, count, and updated hour for two days.
db.FaceData.aggregate([ {$match: { 'Timestamp' : { $gte : 1448121600000, $lt : 1448294399000 }, 'DID' : "ABFR001" }}, {$group: { _id: {'Gen': '$Gen'}, count : { $sum : 1 } }} ]);
output:
------
{ "_id" : { "Gen" : 1 }, "count" : 3055 }
{ "_id" : { "Gen" : 0 }, "count" : 2866 }
In the above output I have to group by hour for two days, For Example, Every hour I need Gender, Count for 2days.
Timestamp is in millisecond.
You would need a mechanism to get the actual date object from the unix timestamp, one way is to add the timestamp to a zero-milliseconds Date() object, using the $add operator in the $project stage before the actual grouping aggregation pipeline.
Once you get the date, extract the hour part by using the $hour operator, something like the following:
db.FaceData.aggregate([
{
"$match": {
"Timestamp" : { $gte : 1448121600000, $lt : 1448294399000 },
"DID" : "ABFR001"
}
},
{
$project : {
"hourPart" : {
"$hour": { "$add": [ new Date(0), "$Timestamp" ] }
},
"Gen": 1
}
},
{
"$group": {
"_id": "$hourPart",
"Gen_0_count" : {
"$sum": {
"$cond": [ { "$eq": [ "$Gen", 0 ] }, 1, 0 ]
}
},
"Gen_1_count" : {
"$sum": {
"$cond": [ { "$eq": [ "$Gen", 1 ] }, 1, 0 ]
}
}
}
}
]);
{"$match": {
"Timestamp" : { $gte : 1448121600000, $lt : 1448294399000 },
"DID" : "ABFR001"
}} ,
{ "$group" : {
"_id" : {
"$divide" : [{ "$subtract" : [{"$divide" : ["$Timestamp", 1000]}, { "$mod" : [{"$divide" : ["$Tstmp", 1000]}, 3600] }] }, 3600 ]
},
"Male" : {
"$sum": {
"$cond": [ { "$eq": [ "$Gen", 0 ] }, 1, 0 ]
}
},
"Female" : {
"$sum": {
"$cond": [ { "$eq": [ "$Gen", 1 ] }, 1, 0 ]
}
}
} }
I have collection in my mongoDB which stores service given to customer along with their email address something like below
{
"_id" : ObjectId("56a84627f8fd4a136c0e944a"),
"Vehicle" : "Honda",
"ServiceSelected" : "FULL SERVICE",
"FullName" : "xyz",
"Email" : "xyz#xyz.com",
"BookingTime" : ISODate("2015-12-27T06:00:00.000Z")
},
{
"_id" : ObjectId("56a84627f8fd4a136c0e944b"),
"Vehicle" : "AUDI",
"ServiceSelected" : "FLAT TYRE",
"FullName" : "abc",
"Email" : "abc#abc.com",
"BookingTime" : ISODate("2015-12-26T06:00:00.000Z")
},
{
"_id" : ObjectId("56a84627f8fd4a136c0e944c"),
"Vehicle" : "BMW",
"ServiceSelected" : "OTHERS",
"FullName" : "def",
"Email" : "def#def.com",
"BookingTime" : ISODate("2015-12-25T06:00:00.000Z")
},
{
"_id" : ObjectId("56a84627f8fd4a136c0e944d"),
"Vehicle" : "BMW",
"ServiceSelected" : "OTHERS",
"FullName" : "def",
"Email" : "def#def.com",
"BookingTime" : ISODate("2015-12-30T06:00:00.000Z")
},
{
"_id" : ObjectId("56a84627f8fd4a136c0e944a"),
"Vehicle" : "Honda",
"ServiceSelected" : "FULL SERVICE",
"FullName" : "xyz",
"Email" : "xyz#xyz.com",
"BookingTime" : ISODate("2016-01-27T06:00:00.000Z")
}
From the above collection I want to fetch all the documents that have taken our service with a gap of at-least 30 days i.e. from the above collection "Email" : "xyz#xyz.com" should be returned but not "Email" : "def#def.com" as the second service was taken with in 5 days.
I know there is flaw in the design and an additional flag can be set while inserting the record from the application but I need to fetch the data for the existing records.
You need to use the $min and $max operators which respectively return the minimum and maximum value for "BookingTime" in your $group stage. The last stage in the pipeline is the $redact stage where you use a simple "date" math using the $divide and $subtract arithmetic operators.to return those documents where the number of days between first "service" and last "service" is greater than 30
db.collection.aggregate( [
{ "$group": {
"_id": "$Email",
"date1": { "$min": "$BookingTime" },
"date2": { "$max": "$BookingTime" }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$divide": [
{ "$subtract": [ "$date2", "$date1" ] },
1000 * 60 * 60 * 24
]},
30
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
Which returns:
{
"_id" : "xyz#xyz.com",
"date1" : ISODate("2015-12-27T06:00:00Z"),
"date2" : ISODate("2016-01-27T06:00:00Z")
}
Another way to do this is by using the $cond operator in a $project stage to avoid a collection scan.
db.collection.aggregate( [
{ "$group": {
"_id": "$Email",
"date1": { "$min": "$BookingTime" },
"date2": { "$max": "$BookingTime" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gte": 2 } } },
{ "$project": {
"emails": {
"$cond": [
{ "$gte": [
{ "$divide": [
{ "$subtract": [ "$date2", "$date1" ] },
1000 * 60 * 60 * 24
]},
30
] },
"$_id",
false
]
}
}},
{ "$match": { "emails": { "$ne": false } } }
])
You can get first sales date and last sales date by $min and $max:
db.services.aggregate({
$group: {
"_id" :"$Email",
lastSalesDate: { $max: "$BookingTime" },
firstSalesDate: { $min: "$BookingTime" }
}
}
)
After that you can add filter based on lastSalesDate. You can calculate ISO date which 30 days before. ex. ISODate("2015-12-28T00:00:00.000Z"). By $lt , you will get customers of 30 days before.
db.services.aggregate(
{
$group: {
"_id" :"$Email",
lastSalesDate: { $max: "$BookingTime" },
firstSalesDate: { $min: "$BookingTime" }
}
},
{
$match : {
"lastSalesDate" : { $lt: ISODate("2015-12-28T00:00:00.000Z") }
}
}
)
Results like:
{
"_id" : "abc#abc.com",
"lastSalesDate" : ISODate("2015-12-26T06:00:00.000+0000"),
"firstSalesDate" : ISODate("2015-12-26T06:00:00.000+0000")
}
This is what I used finally
db.services.aggregate(
{$group: {
"_id" :"$Email",
count:{$sum:1},
lastSalesDate: { $max: "$BookingTime" },
firstSalesDate: { $min: "$BookingTime" }
},
{$project:{
_id:1,count:1,dateDifference: { $divide:[ {$subtract: [ "$lastSalesDate", "$firstSalesDate" ]},86400000] }
}
},
{$match:{
count:{$gt:1},dateDifference:{$gt:20}
}
}
}
)
Count > 1 helped to filter the records which never repeated and datedifferentce > 20 is for days as I already converted milliseconds to days using division operation.
My collection will look this,
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "xxx",
"salary" : 10000,
"type" : "type1"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "aaa",
"salary" : 10000,
"type" : "type2"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "ccc",
"salary" : 10000,
"type" : "type2"
}
My query params will be coming as,
{salary=10000, type=type2}
so based on the query I need to fetch the count of above query params
The result should be something like this,
{ category: 'type1', count: 500 } { category: 'type2', count: 200 } { category: 'name', count: 100 }
Now I am getting count by hitting three different queries and constructing the result (or) server side iteration I can get the result.
Can anyone suggest or provide me good way to get above result
Your quesstion is not very clearly presented, but what it seems you wanted to do here was count the occurances of the data in the fields, optionally filtering those fields by the values that matches the criteria.
Here the $cond operator allows you to tranform a logical condition into a value:
db.collection.aggregate([
{ "$group": {
"_id": null,
"name": { "$sum": 1 },
"salary": {
"$sum": {
"$cond": [
{ "$gte": [ "$salary", 1000 ] },
1,
0
]
}
},
"type": {
"$sum": {
"$cond": [
{ "$eq": [ "$type", "type2" ] },
1,
0
]
}
}
}}
])
All values are in the same document, and it does not really make any sense to split them up here as this is additional work in the pipeline.
{ "_id" : null, "name" : 3, "salary" : 3, "type" : 2 }
Otherwise in the long form, which is not very performant due to needing to make a copy of each document for every key looks like this:
db.collection.aggregate([
{ "$project": {
"name": 1,
"salary": 1,
"type": 1,
"category": { "$literal": ["name","salary","type"] }
}},
{ "$unwind": "$category" },
{ "$group": {
"_id": "$category",
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$category", "name"] },
{ "$ifNull": [ "$name", false ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "salary" ] },
{ "$gte": [ "$salary", 1000 ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "type" ] },
{ "$eq": [ "$type", "type2" ] }
]},
1,
0
]}
]}
]
}
}
}}
])
And it's output:
{ "_id" : "type", "count" : 2 }
{ "_id" : "salary", "count" : 3 }
{ "_id" : "name", "count" : 3 }
If your documents do not have uniform key names or otherwise cannot specify each key in your pipeline condition, then apply with mapReduce instead:
db.collection.mapReduce(
function() {
var doc = this;
delete doc._id;
Object.keys(this).forEach(function(key) {
var value = (( key == "salary") && ( doc[key] < 1000 ))
? 0
: (( key == "type" ) && ( doc[key] != "type2" ))
? 0
: 1;
emit(key,value);
});
},
function(key,values) {
return Array.sum(values);
},
{
"out": { "inline": 1 }
}
);
And it's output:
"results" : [
{
"_id" : "name",
"value" : 3
},
{
"_id" : "salary",
"value" : 3
},
{
"_id" : "type",
"value" : 2
}
]
Which is basically the same thing with a conditional count, except that you only specify the "reverse" of the conditions you want and only for the fields you want to filter conditions on. And of course this output format is simple to emit as separate documents.
The same approach applies where to test the condition is met on the fields you want conditions for and return 1 where the condition is met or 0 where it is not for the summing the count.
You can use aggregation as following query:
db.collection.aggregate({
$match: {
salary: 10000,
//add any other condition here
}
}, {
$group: {
_id: "$type",
"count": {
$sum: 1
}
}
}, {
$project: {
"category": "$_id",
"count": 1,
_id: 0
}
}
I have a hourly report in mongodb which has some data for each hour. Now I want to get bi-hourly report from it meaning that it will have the sum of field "count" and "value" from every two hours. How to do the aggregation? Thanks a lot!
Before, hourly data:
/* 1 */
{
"count" : 63713,
"value" : 46151,
"timestamp" : ISODate("2014-09-17T18:59:04.247+03:00"),
}
/* 2 */
{
"count" : 63743,
"value" : 48327,
"timestamp" : ISODate("2014-09-17T19:59:04.281+03:00"),
}
/* 3 */
{
"count" : 63761,
"value" : 51650,
"timestamp" : ISODate("2014-09-17T20:59:04.295+03:00"),
}
/* 4 */
{
"count" : 63756,
"value" : 52865,
"timestamp" : ISODate("2014-09-17T21:59:04.298+03:00"),
}
After, bi-hourly data:
/* sum of documents 1&2 */
{
"count" : 117456,
"value" : 94478,
"timestamp" : ISODate("2014-09-17T18:59:04.247+03:00"),
}
/* sum of documents 3&4 */
{
"count" : 127517,
"value" : 104515,
"timestamp" : ISODate("2014-09-17T20:59:04.295+03:00"),
}
Actually your "bi-hourly" data in a day would cover three time periods from the sample as given. So Document 1 is in the first of a two hour block, 2 & 3 are in the second and 4 is in the third.
So you can really just apply some take math here to get 12 two hour intervals within a day:
db.times.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$timestamp", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$timestamp", new Date("1970-01-01") ] },
1000 * 60 * 60 * 2
]}
],
},
"count": { "$sum": "$count" },
"value": { "$sum": "$value" }
}},
{ "$sort": { "_id": 1 } }
])
Which would produce a timestamp value representing the date at two hour intervals. Or you could just use the date aggregation operators instead:
db.times.aggregate([
{ "$group": {
"_id": {
"day": { "$dayOfYear": "$timestamp" },
"hour": {
"$subtract": [
{ "$hour": "$timestamp" },
{ "$mod": [ { "$hour": "$timestamp" }, 2 ] }
]
}
},
"count": { "$sum": "$count" },
"value": { "$sum": "$value" }
}},
{ "$sort": { "_id": 1 } }
])