MondoDB Aggregate - Absolute count for each day - mongodb

I have the following (already aggregated)collection
{ "_id" : { "day" : "2015-02-01" }, "total" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "total" : 3 }
{ "_id" : { "day" : "2015-02-03" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-04" }, "total" : 10 }
{ "_id" : { "day" : "2015-02-05" }, "total" : 5 }
What i need is calculating an absolute value for each day, summing the previous days. So expected result would be in the case above
{ "_id" : { "day" : "2015-02-01" }, "absolutetotalforday" : 2 }
{ "_id" : { "day" : "2015-02-02" }, "absolutetotalforday" : 5 }
{ "_id" : { "day" : "2015-02-03" }, "absolutetotalforday" : 15 }
{ "_id" : { "day" : "2015-02-04" }, "absolutetotalforday" : 25 }
{ "_id" : { "day" : "2015-02-05" }, "absolutetotalforday" : 30 }
Currently no clue how to achieve this with 1 query. Of course i could do a sum for each day I'm interested in, but this might be a long time range
Any help appreciated

Because aggregation framework has no mechanism of knowing the value of a previous document, or the previous "grouped" value of a document, your best bet would be to use Map-Reduce in this case.
Map-Reduce will give you the "running total" for the current total at end of day you require although this won't be in the desired key, absolutetotalforday but in a key called value since the reduced values are always invalue key.
The following mapReduce() operation will give you the desired result, assuming the results from the previous aggregation operation were output to a separate collection named agg_results:
db.agg_results.mapReduce(
function() { emit( this._id, this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": { "inline": 1 }
}
);
Sample Output
{
"results" : [
{
"_id" : {
"day" : "2015-02-01"
},
"value" : 2
},
{
"_id" : {
"day" : "2015-02-02"
},
"value" : 5
},
{
"_id" : {
"day" : "2015-02-03"
},
"value" : 15
},
{
"_id" : {
"day" : "2015-02-04"
},
"value" : 25
},
{
"_id" : {
"day" : "2015-02-05"
},
"value" : 30
}
],
"timeMillis" : 0,
"counts" : {
"input" : 5,
"emit" : 5,
"reduce" : 0,
"output" : 5
},
"ok" : 1
}
Sorting the results will not work with inline results and with dates of String type. Instead, try converting the date strings to a JavaScript date object, write the results to a collection and then run a sort on that collection:
db.agg_results.mapReduce(
function() { emit( new Date(this._id.day), this.total ); },
function(key, values) { return Array.sum(values); },
{
"scope": { "total": 0 },
"finalize": function(key, value) {
total += value;
return total;
},
"out": "tmpResults"
}
);
Sample Output (with sort)
> db.tmpResults.find().sort({_id: 1})
{ "_id" : ISODate("2015-02-01T00:00:00Z"), "value" : 2 }
{ "_id" : ISODate("2015-02-02T00:00:00Z"), "value" : 5 }
{ "_id" : ISODate("2015-02-03T00:00:00Z"), "value" : 15 }
{ "_id" : ISODate("2015-02-04T00:00:00Z"), "value" : 25 }
{ "_id" : ISODate("2015-02-05T00:00:00Z"), "value" : 30 }
>

Related

Get lowest per date from multiple arrays in mongodb

I've the following structure of docs:
{
"_id" : ObjectId("5786458371d24d924d8b4575"),
"uniqueNumber" : "3899822714",
"lastUpdatedAt" : ISODate("2016-07-13T20:11:11.000Z"),
"new" : [
{
"price" : 8.4,
"created" : ISODate("2016-07-13T13:11:28.000Z")
},
{
"price" : 10.0,
"created" : ISODate("2016-07-13T14:50:56.000Z")
}
],
"used" : [
{
"price" : 10.99,
"created" : ISODate("2016-07-08T13:46:31.000Z")
},
{
"price" : 8.59,
"created" : ISODate("2016-07-13T13:11:28.000Z")
}
]
}
Now I need to get a list that gives me the lowest price of each array per date.
So, as example:
{
"uniqueNumber" : 1234,
"prices" : {
"created" : 2016-07-08,
"minNew" : 123,
"minUsed" : 22
}
}
By now I've built the following query
db.getCollection('col').aggregate([
{
$match : {
"uniqueNumber" : "3899822714"
}
},
{
$unwind : "$used"
},
{
$project : {
"uniqueNumber" : "$uniqueNumber",
"price" : "$used.price",
"ts" : "$used.created"
}
},
{
$sort : { "ts" : 1 }
},
{
$group : {_id: "$uniqueNumber", priceOfMaxTS : { $min: "$price" }, ts : { $last: "$ts" }}
}
]);
But this one will only give me the lowest price for the highest date. I couldn't really find anything that pushes me to the right direction to get the desired result.
UPDATE
I've found a way to get the lowest price of the used array grouped by day with this query:
db.getCollection('col').aggregate([
{
$match : {
"uniqueNumber" : "3899822714"
}
},
{
$unwind : "$used"
},
{
$project : {
"asin" : "$uniqueNumber",
"price" : "$used.price",
"ts" : "$used.created",
"y" : { "$year" : "$used.created" },
"m" : { "$month" : "$used.created" },
"d" : { "$dayOfMonth" : "$used.created" }
}
},
{
$group : { _id : { "year" : "$y", "month" : "$m", "day" : "$d" }, minPriceOfDay : { $min: "$price" }}
}
]);
No I only need to find a way to do this also to the new array in the same query.

Return count and average of field in subdocument

I have .json-file that I have imported into my collection.
{
"_id" : ObjectId("5739ee85daa49f685e316fc6"),
"id" : 38,
"title" : "It Takes Two (1995)",
"genre" : "Comedy",
"ratings" : [
{
"userId" : 26,
"rating" : 2
},
{
"userId" : 531,
"rating" : 2
},
{
"userId" : 1054,
"rating" : 2
},
{
"userId" : 1068,
"rating" : 2
},
{
"userId" : 1221,
"rating" : 5
},
{
"userId" : 1434,
"rating" : 4
},
{
"userId" : 1448,
"rating" : 1
},
{
"userId" : 1645,
"rating" : 5
},
{
"userId" : 1647,
"rating" : 1
},
{
"userId" : 1958,
"rating" : 3
},
{
"userId" : 2010,
"rating" : 1
},
{
"userId" : 2042,
"rating" : 1
},
{
"userId" : 2063,
"rating" : 1
},
{
"userId" : 2106,
"rating" : 1
},
{
"userId" : 2116,
"rating" : 3
},
{
"userId" : 2541,
"rating" : 5
},
{
"userId" : 2777,
"rating" : 3
},
{
"userId" : 3013,
"rating" : 2
},
{
"userId" : 3029,
"rating" : 2
},
{
"userId" : 3111,
"rating" : 4
},
{
"userId" : 4387,
"rating" : 1
},
{
"userId" : 4572,
"rating" : 5
},
{
"userId" : 5361,
"rating" : 5
}
]
}
I want to do some map reduce in order to show all users with the total number of their reviews and its average value.
I tried:
var map = function(){emit(this.ratings.userId, 1);}
var reduce = function(key, values){var res = 0;
values.forEach(function(v){ res += 1});
return {count: res};
}
db.movie.mapReduce(map, reduce, { out: "users" });
db.users.find()
{ "_id" : null, "value" : { "count" : 39 } }
I have no idea, why it shows _id" : null. I suppose this.ratings.userId was wrong. But this.ratings[userId] doesnt work either.
I expect something like:
userId:10, count:2000
userId:20, count:500
Can you please help?
You are using the wrong tools. You need to use the aggregate() method which gives access to the aggregation pipeline. In your pipeline you need to de-normalise the "ratings" array using the $unwind operator. From there you simple group your documents by "userId" and use the $sum and $avg accumulator operators which respectively return the sum and the average of your field.
db.movie.aggregate([
{ "$unwind": "$ratings" },
{ "$group": {
"_id": "$ratings.userId",
"count": { "$sum": 1 },
"average": { "$avg": "$ratings.rating" }
}}
])
I found the solution:
var mapFunction = function() {
for (var idx = 0; idx < this.ratings.length; idx++) {
var key = this.ratings[idx].userId;
var value = {
count: 1,
rating: this.ratings[idx].rating
};
emit(key, value);
}
};
var reduceFunction = function(keyUSERID, countObjVals) {
reducedVal = { count: 0, rating: 0 };
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.rating += countObjVals[idx].rating;
}
return reducedVal;
};
var finalizeFunction = function (key, reducedVal) {
reducedVal.avg = reducedVal.rating/reducedVal.count;
return reducedVal;
};
db.movies.mapReduce( mapFunction,
reduceFunction,
{
out: "users",
finalize: finalizeFunction
}
)
db.users.find() gives me:
{ "_id" : 1, "value" : { "count" : 56, "rating" : 237, "avg" : 4.232142857142857 } }
{ "_id" : 2, "value" : { "count" : 129, "rating" : 479, "avg" : 3.7131782945736433 } }
{ "_id" : 3, "value" : { "count" : 51, "rating" : 199, "avg" : 3.9019607843137254 } }
{ "_id" : 4, "value" : { "count" : 21, "rating" : 88, "avg" : 4.190476190476191 } }
{ "_id" : 5, "value" : { "count" : 198, "rating" : 623, "avg" : 3.1464646464646466 } }

How to Group mongodb - mapReduce output?

i have a query regarding the mapReduce framework in mongodb, so i have a result of key value pair from mapReduce function , now i want to run the query on this output of mapReduce.
So i am using mapReduce to find out the stats of user like this
db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){
var sum =0 ; var lastOrderDate;
values.forEach(function(value) {
if(value['orderDate']){
lastOrderDate=value['orderDate'];
}
sum+=value['count'];
});
return {count:sum,lastOrderDate:lastOrderDate};
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()
which give me output like this
{ "_id" : ObjectId("5443765ae4b05294c8944d5b"), "value" : { "count" : 1, "orderDate" : ISODate("2014-10-18T18:30:00Z") } }
{ "_id" : ObjectId("54561911e4b07a0a501276af"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-03-14T18:30:00Z") } }
{ "_id" : ObjectId("54561b9ce4b07a0a501276b1"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-01T18:30:00Z") } }
{ "_id" : ObjectId("5458712ee4b07a0a501276c2"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2014-11-03T18:30:00Z") } }
{ "_id" : ObjectId("545f64e7e4b07a0a501276db"), "value" : { "count" : 15, "lastOrderDate" : ISODate("2015-06-04T18:30:00Z") } }
{ "_id" : ObjectId("54690771e4b0070527c657ed"), "value" : { "count" : 6, "lastOrderDate" : ISODate("2015-06-03T18:30:00Z") } }
{ "_id" : ObjectId("54696c64e4b07f3c07010b4a"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-18T18:30:00Z") } }
{ "_id" : ObjectId("546980d1e4b07f3c07010b4d"), "value" : { "count" : 4, "lastOrderDate" : ISODate("2015-03-24T18:30:00Z") } }
{ "_id" : ObjectId("54699ac4e4b07f3c07010b51"), "value" : { "count" : 30, "lastOrderDate" : ISODate("2015-05-23T18:30:00Z") } }
{ "_id" : ObjectId("54699d0be4b07f3c07010b55"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469a1dce4b07f3c07010b59"), "value" : { "count" : 2, "lastOrderDate" : ISODate("2015-04-29T18:30:00Z") } }
{ "_id" : ObjectId("5469a96ce4b07f3c07010b5e"), "value" : { "count" : 1, "orderDate" : ISODate("2014-11-16T18:30:00Z") } }
{ "_id" : ObjectId("5469c1ece4b07f3c07010b64"), "value" : { "count" : 9, "lastOrderDate" : ISODate("2015-04-15T18:30:00Z") } }
{ "_id" : ObjectId("5469f422e4b0ce7d5ee021ad"), "value" : { "count" : 5, "lastOrderDate" : ISODate("2015-06-01T18:30:00Z") } }
......
Now i want to run query and group the users on the basis of count in different categories like for user with count less than 5 in one group , 5-10 in another, etc
and want output something like this
{userLessThan5: 9 }
{user5to10: 2 }
{user10to15: 1 }
{user15to20: 0 }
....
Try this,
db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) },
function(key,values){
var category; // add this new field
var sum =0 ; var lastOrderDate;
values.forEach(function(value) {
if(value['orderDate']){
lastOrderDate=value['orderDate'];
}
sum+=value['count'];
});
// at this point you are already aware in which category your records lies , just add a new field to mark it
if(sum < 5){ category: userLessThan5};
if(sum >= 5 && sum <=10){ category: user5to10};
if(sum <= 10 && sum >= 15){ category: user10to15};
if(sum <= 15 && sum >=20){ category: user15to20};
....
return {count:sum,lastOrderDate:lastOrderDate,category:category};
},
{ query:{status:"DELIVERED"},out:"order_total"}).find()
db.order_total.aggregate([{ $group: { "_id": "$value.category", "users": { $sum: 1 } } }]);
you will get you desired result
{userLessThan5: 9 }
{user5to10: 2 }
{user10to15: 1 }
{user15to20: 0 }
....
I wrote a query using your data in aggregation as per my knowledge, there may be better way to solve this problem.
var a=db.test.aggregate([{$match:{"value.count":{$lt:5}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"less than 5",total:{$sum:"$total"}}}])
var b=db.test.aggregate([{$match:{"value.count":{$lt:10,$gt:5}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"between 5 and 10",total:{$sum:"$total"}}}])
var c=db.test.aggregate([{$match:{"value.count":{$lt:15,$gt:10}}},
{ $group: { _id:"$value.count",total:{"$sum":1}}},
{$group:{_id:"between 10 and 15",total:{$sum:"$total"}}}])
insert a, b, c into another collection
You could try to group the output data after mapreduce to every 5 interval count through aggregate like below
db.data.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$value.count", 0 ] },
{ "$mod": [
{ "$subtract": [ "$value.count", 0 ] },
5
]}
]
},
"count": { "$sum": 1 }
}}
])
Also maybe here is one related question here.

How can I get a running total with mongodb aggregate framework?

I am fairly new to MongoDB and I am playing with the aggregate framework. One of the examples from the documentation shows the following, which returns total number of new user joins per month and lists the month joined:
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The code outputs the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Is it possible to also have each object contain the sum of all users that have joined since the start, so I don't have to run over the objects programmatically and calculate it myself?
Example desired output:
{
"_id" : {
"month_joined" : 1
},
"number" : 3,
"total": 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9,
"total": 12
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5,
"total": 17
}

Mongo map-reduce output, how to read results back?

I have a map-reduce query that "works" and does what I want however I have so far spectacularly failed to make use of my output data because I cannot workout how to read it back... let me explain... here is my emit:
emit( { jobid: this.job_id, type: this.type}, { count: 1 })
and the reduce function:
reduce: function (key, values) {
var total = 0;
for( i = 0; i < values.length; i++ ) {
total += values[i].count;
}
return { jobid: this.job_id, type:this.type, count: total};
},
It functions and the output I get in the results collection looks like this:
{ "_id" : { "jobid" : "5051ef142a120", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051ef142a120", "type" : 5 }, "value" : { "count" : 43 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f1a9d5442", "type" : 3 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 2 }, "value" : { "count" : 1 } }
{ "_id" : { "jobid" : "5051f299340b1", "type" : 3 }, "value" : { "count" : 1 } }
BUT HOW the hell do I issue a query that says find me all jobid entries by "jobid" whilst ignoring the type? I tried this intiailly, expecting two rows of output but got none!
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
I have also tried and failed with:
db.mrtest.find( { "_id": { "jobid" : "5051f299340b1" }} );
and whilst:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : 2 }} )
does produce one row of output as hoped for, changing it to this again fails to produce anything:
db.mrtest.find( { "_id" : { "jobid" : "5051f299340b1", "type" : { $in: [2] }}} )
I get the impression that you can't do such things with the _id field, or can you? I am thinking I need to re-organise my mr output instead but that feels like failing somehow ?!?!
Help!
PS: If anybody can explain why the count is contained in a field called "value", that would also be welcome!"5051f299340b1"
Have you tried:
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })
That works for me!
db.mrtest.find( { "_id.jobid": "506ea3a85e126" })