How can I get a running total with mongodb aggregate framework? - mongodb

I am fairly new to MongoDB and I am playing with the aggregate framework. One of the examples from the documentation shows the following, which returns total number of new user joins per month and lists the month joined:
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The code outputs the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Is it possible to also have each object contain the sum of all users that have joined since the start, so I don't have to run over the objects programmatically and calculate it myself?
Example desired output:
{
"_id" : {
"month_joined" : 1
},
"number" : 3,
"total": 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9,
"total": 12
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5,
"total": 17
}

Related

MongoDB: Aggregation Group operation over a dynamic key-value pair

I have a simple document with one field as a key-value pair. I want to just perform a group operation in Aggregation over those keys and add their values. But the keys in the pair are not fixed and can be anything.
Here is a sample document.
{
_id: 349587843,
matchPair: {
3 : 21,
9 : 4,
7 : 32
}
},
{
_id: 349587478,
matchPair: {
7 : 11,
54 : 32,
9 : 7,
2 : 19
}
}
And I want a result something like the following.
{
_id : 3,
count : 21
},
{
_id : 9,
count : 11
},
{
_id : 7,
count : 43
},
{
_id : 54,
count : 32
},
{
_id : 2,
count : 19
}
I have the following query in mind and tried using $unwindoperation but it doesn't work probably because "matchPair" isn't an array and I don't know what to specify for the $sumoperation.
db.MatchPairs.aggregate([
{ "$unwind" : "$matchPair" },
{ "$group" : {
_id: "$matchPair",
count : { $sum : $matchPair }
} }
]);
I could also try Map-Reduce but for that too I need to emit() keys and values by name.
I'm sure there's a simple solution to this but I can't figure it out.
:
You could start by projecting and reshaping the matchPair field with $objectToArray
New in version 3.4.4.
{
$project: {
matchPair: { $objectToArray: '$matchPair' }
}
}
which would give
{
matchPair: [{ k: 3, v: 21 }, { k: 9, v: 4 }, ...]
}
Then $unwind based on matchPair
{
$unwind: '$matchPair'
}
which would give
{
matchPair: { k: 3, v: 21 }
}
Then $project
{
$project: {
_id: '$matchPair.k',
count: '$matchPair.v'
}
}
That should give the output you want. Altogether would be
.aggregate([
{
$project: {
matchPair: { $objectToArray: '$matchPair' }
}
},
{ $unwind: '$matchPair' },
{
$project: {
_id: '$matchPair.k',
count: '$matchPair.v'
}
}
])
In the mongoDb documentation for $unwind:
Deconstructs an array field from the input documents to output a
document for each element.
So you have to change your schema for something like:
{
"_id" : ObjectId("5880b57b039a3c89c1db145a"),
"matchPair" : [
{
"_id" : "3",
"count" : 21
},
{
"_id" : "9",
"count" : 4
},
{
"_id" : "7",
"count" : 32
}
]
},
{
"_id" : ObjectId("5880b58c039a3c89c1db145b"),
"matchPair" : [
{
"_id" : "7",
"count" : 11
},
{
"_id" : "54",
"count" : 32
},
{
"_id" : "9",
"count" : 7
},
{
"_id" : "2",
"count" : 19
}
]
}
Then doing:
db.MatchPairs.aggregate([
{ $unwind : "$matchPair" }
]);
will return:
{
"_id" : ObjectId("5880b57b039a3c89c1db145a"),
"matchPair" : {
"_id" : "3",
"count" : 21
}
},
{
"_id" : ObjectId("5880b57b039a3c89c1db145a"),
"matchPair" : {
"_id" : "9",
"count" : 4
}
},
{
"_id" : ObjectId("5880b57b039a3c89c1db145a"),
"matchPair" : {
"_id" : "7",
"count" : 32
}
},
{
"_id" : ObjectId("5880b58c039a3c89c1db145b"),
"matchPair" : {
"_id" : "7",
"count" : 11
}
},
{
"_id" : ObjectId("5880b58c039a3c89c1db145b"),
"matchPair" : {
"_id" : "54",
"count" : 32
}
},
{
"_id" : ObjectId("5880b58c039a3c89c1db145b"),
"matchPair" : {
"_id" : "9",
"count" : 7
}
},
{
"_id" : ObjectId("5880b58c039a3c89c1db145b"),
"matchPair" : {
"_id" : "2",
"count" : 19
}
}
Then you just have to do your grouping.

How to Change output format when using group in aggregation mongodb

I Need to change the out put format of following mongodb query.
Query
db.Response.aggregate([{
"$match": {
"$and": [{
"job_details.owner_id" : 482,
}, {
"job_details.owner_type" : 'searches',
}],
},
},
{
"$group": {
"_id": "$candidate_sublocation_name_string",
"count": {
"$sum": 1,
},
},
}])
Actual Out Put
{ "_id" : "Central Delhi ", "count" : 1 }
{ "_id" : "Adyar ", "count" : 1 }
{ "_id" : "Bommanahalli", "count" : 2 }
{ "_id" : "DLF Phase 3 ", "count" : 2 }
{ "_id" : "Aai Colony", "count" : 1 }
Needed Out Put
{ "Central Delhi" : 1 }
{ "Adyar" : 1 }
{ "Bommanahalli" : 2 }
{ "DLF Phase 3 ": 2 }
{ "Aai Colony": 1 }
Is it possible to change the out put format like this..?

How can I split a MongoDB collection into 3 and assign a new field?

I have a json collection with 300 records like this:
{
salesNumber: 23839,
batch: null
},
{
salesNumber 389230,
batch: null
}
...etc.
I need to divide this collection into 3 different batches. So, when sorted by salesNumber, the first 100 would be in batch 1, the next 100 would be batch 2, and the last 100 would be batch 3. How do I do this?
I wrote a script to select the first 100, but when I tried to turn it into an array to use in an update, the result was 0 records.
var firstBatchCompleteRecords = db.properties.find(
{
"auction": ObjectId("50")
}
).sort("saleNumber").limit(100);
// This returned 174 records as excepted with all the fields
var firstBatch = firstBatchCompleteRecords.distinct( "saleNumber", {});
// This returned 0 records
I was going to take the results of that last query and use it in an update statement:
db.properties.update(
{
"saleNumber":
{
"$in": firstBatch
}
}
,
{
$set:
{
batch: "1"
}
}
,
{
multi: true
}
);
...then I would have created an array using distinct of the next 100 and update those, but I never got that far.
there is a chance to get results using aggregation framework - and store them in new collection - then you can use this answer to iterate and update fields in source collection
Have a fun!
db.sn.aggregate([{
$sort : {
salesNumber : 1
}
}, {
$group : {
_id : null,
arrayOfData : {
$push : "$$ROOT"
},
}
}, {
$project : {
_id : 0,
firstHundred : {
$slice : ["$arrayOfData", 0, 100]
},
secondHundred : {
$slice : ["$arrayOfData", 99, 100]
},
thirdHundred : {
$slice : ["$arrayOfData", 199, 100]
},
}
}, {
$project : {
"firstHundred.batch" : {
$literal : 1
},
"firstHundred.salesNumber" : 1,
"firstHundred._id" : 1,
"secondHundred.batch" : {
$literal : 2
},
"secondHundred.salesNumber" : 1,
"secondHundred._id" : 1,
"thirdHundred.batch" : {
$literal : 3
},
"thirdHundred.salesNumber" : 1,
"thirdHundred._id" : 1,
}
}, {
$project : {
allValues : {
$setUnion : ["$firstHundred", "$secondHundred", "$thirdHundred"]
}
}
}, {
$unwind : "$allValues"
}, {
$project : {
_id : "$allValues._id",
salesNumber : "$allValues.salesNumber",
batch : "$allValues.batch",
}
}, {
$out : "collectionName"
}
])
db.collectionName.find()
and output generated for 6 document divided by 2:
{
"_id" : ObjectId("5733ade7eeeccba2bd546121"),
"salesNumber" : 389230,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546120"),
"salesNumber" : 23839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546122"),
"salesNumber" : 43839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546124"),
"salesNumber" : 63839,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546123"),
"salesNumber" : 589230,
"batch" : 3
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546125"),
"salesNumber" : 789230,
"batch" : 3
}
Any comments welcome!

How to Group mongodb - mapReduce output?

i have a query regarding the mapReduce framework in mongodb, so i have a result of key value pair from mapReduce function , now i want to run the query on this output of mapReduce.
So i am using mapReduce to find out the stats of user like this
db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){
var sum =0 ; var lastOrderDate;
values.forEach(function(value) {
if(value['orderDate']){
lastOrderDate=value['orderDate'];
}
sum+=value['count'];
});
return {count:sum,lastOrderDate:lastOrderDate};
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()
which give me output like this
{ "_id" : ObjectId("5443765ae4b05294c8944d5b"), "value" : { "count" : 1, "orderDate" : ISODate("2014-10-18T18:30:00Z") } }
{ "_id" : ObjectId("54561911e4b07a0a501276af"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-03-14T18:30:00Z") } }
{ "_id" : ObjectId("54561b9ce4b07a0a501276b1"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-01T18:30:00Z") } }
{ "_id" : ObjectId("5458712ee4b07a0a501276c2"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2014-11-03T18:30:00Z") } }
{ "_id" : ObjectId("545f64e7e4b07a0a501276db"), "value" : { "count" : 15, "lastOrderDate" : ISODate("2015-06-04T18:30:00Z") } }
{ "_id" : ObjectId("54690771e4b0070527c657ed"), "value" : { "count" : 6, "lastOrderDate" : ISODate("2015-06-03T18:30:00Z") } }
{ "_id" : ObjectId("54696c64e4b07f3c07010b4a"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-18T18:30:00Z") } }
{ "_id" : ObjectId("546980d1e4b07f3c07010b4d"), "value" : { "count" : 4, "lastOrderDate" : ISODate("2015-03-24T18:30:00Z") } }
{ "_id" : ObjectId("54699ac4e4b07f3c07010b51"), "value" : { "count" : 30, "lastOrderDate" : ISODate("2015-05-23T18:30:00Z") } }
{ "_id" : ObjectId("54699d0be4b07f3c07010b55"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469a1dce4b07f3c07010b59"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-04-29T18:30:00Z") } }
{ "_id" : ObjectId("5469a96ce4b07f3c07010b5e"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469c1ece4b07f3c07010b64"), "value" : { "count" : 9, "lastOrderDate" : ISODate("2015-04-15T18:30:00Z") } }
{ "_id" : ObjectId("5469f422e4b0ce7d5ee021ad"), "value" : { "count" : 5, "lastOrderDate" : ISODate("2015-06-01T18:30:00Z") } }
......
Now i want to run query and group the users on the basis of count in different categories like for user with count less than 5 in one group , 5-10 in another, etc
and want output something like this
{userLessThan5: 9 }
{user5to10: 2 }
{user10to15: 1 }
{user15to20: 0 }
....
Try this,
db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){
var category; // add this new field
var sum =0 ; var lastOrderDate;
values.forEach(function(value) {
if(value['orderDate']){
lastOrderDate=value['orderDate'];
}
sum+=value['count'];
});
// at this point you are already aware in which category your records lies , just add a new field to mark it
if(sum < 5){ category: userLessThan5};
if(sum >= 5 && sum <=10){ category: user5to10};
if(sum <= 10 && sum >= 15){ category: user10to15};
if(sum <= 15 && sum >=20){ category: user15to20};
....
return {count:sum,lastOrderDate:lastOrderDate,category:category};
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()
db.order_total.aggregate([{ $group: { "_id": "$value.category", "users": { $sum: 1 } } }]);
you will get you desired result
{userLessThan5: 9 }
{user5to10: 2 }
{user10to15: 1 }
{user15to20: 0 }
....
I wrote a query using your data in aggregation as per my knowledge, there may be better way to solve this problem.
var a=db.test.aggregate([{$match:{"value.count":{$lt:5}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"less than 5",total:{$sum:"$total"}}}])
var b=db.test.aggregate([{$match:{"value.count":{$lt:10,$gt:5}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"between 5 and 10",total:{$sum:"$total"}}}])
var c=db.test.aggregate([{$match:{"value.count":{$lt:15,$gt:10}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"between 10 and 15",total:{$sum:"$total"}}}])
insert a, b, c into another collection
You could try to group the output data after mapreduce to every 5 interval count through aggregate like below
db.data.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$value.count", 0 ] },
{ "$mod": [
{ "$subtract": [ "$value.count", 0 ] },
5
]}
]
},
"count": { "$sum": 1 }
}}
])
Also maybe here is one related question here.

MondoDB Aggregate - Absolute count for each day

I have the following (already aggregated)collection
{ "_id" : { "day" : "2015-02-01" }, "total" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "total" : 3 }
{ "_id" : { "day" : "2015-02-03" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-04" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-05" }, "total" : 5 }
What i need is calculating an absolute value for each day, summing the previous days. So expected result would be in the case above
{ "_id" : { "day" : "2015-02-01" }, "absolutetotalforday" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "absolutetotalforday" : 5 }
{ "_id" : { "day" : "2015-02-03" }, "absolutetotalforday" : 15 }
{ "_id" : { "day" : "2015-02-04" }, "absolutetotalforday" : 25 }
{ "_id" : { "day" : "2015-02-05" }, "absolutetotalforday" : 30 }
Currently no clue how to achieve this with 1 query. Of course i could do a sum for each day I'm interested in, but this might be a long time range
Any help appreciated
Because aggregation framework has no mechanism of knowing the value of a previous document, or the previous "grouped" value of a document, your best bet would be to use Map-Reduce in this case.
Map-Reduce will give you the "running total" for the current total at end of day you require although this won't be in the desired key, absolutetotalforday but in a key called value since the reduced values are always invalue key.
The following mapReduce() operation will give you the desired result, assuming the results from the previous aggregation operation were output to a separate collection named agg_results:
db.agg_results.mapReduce(
function() { emit( this._id, this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": { "inline": 1 }
}
);
Sample Output
{
"results" : [
{
"_id" : {
"day" : "2015-02-01"
},
"value" : 2
},
{
"_id" : {
"day" : "2015-02-02"
},
"value" : 5
},
{
"_id" : {
"day" : "2015-02-03"
},
"value" : 15
},
{
"_id" : {
"day" : "2015-02-04"
},
"value" : 25
},
{
"_id" : {
"day" : "2015-02-05"
},
"value" : 30
}
],
"timeMillis" : 0,
"counts" : {
"input" : 5,
"emit" : 5,
"reduce" : 0,
"output" : 5
},
"ok" : 1
}
Sorting the results will not work with inline results and with dates of String type. Instead, try converting the date strings to a JavaScript date object, write the results to a collection and then run a sort on that collection:
db.agg_results.mapReduce(
function() { emit( new Date(this._id.day), this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": "tmpResults"
}
);
Sample Output (with sort)
> db.tmpResults.find().sort({_id: 1})
{ "_id" : ISODate("2015-02-01T00:00:00Z"), "value" : 2 }
{ "_id" : ISODate("2015-02-02T00:00:00Z"), "value" : 5 }
{ "_id" : ISODate("2015-02-03T00:00:00Z"), "value" : 15 }
{ "_id" : ISODate("2015-02-04T00:00:00Z"), "value" : 25 }
{ "_id" : ISODate("2015-02-05T00:00:00Z"), "value" : 30 }
>