MongoDB aggregation over a range - mongodb

I have documents of the following format:
[
{
date:"2014-07-07",
value: 20
},
{
date:"2014-07-08",
value: 29
},
{
date:"2014-07-09",
value: 24
},
{
date:"2014-07-10",
value: 21
}
]
I want to run an aggregation query that gives me results in date ranges. for example
[
{ sum: 49 },
{ sum:45 },
]
So these are daily values, I need to know the sum of value field for last 7 days. and 7 days before these. for example sum from May 1 to May 6 and then sum from May 7 to May 14.
Can I use aggregation with multiple groups and range to get this result in a single mongodb query?

You can use aggregation to group by anything that can be computed from the source documents, as long as you know exactly what you want to do.
Based on your document content and sample output, I'm guessing that you are summing by two day intervals. Here is how you would write aggregation to output this on your sample data:
var range1={$and:[{"$gte":["$date","2014-07-07"]},{$lte:["$date","2014-07-08"]}]}
var range2={$and:[{"$gte":["$date","2014-07-09"]},{$lte:["$date","2014-07-10"]}]}
db.range.aggregate(
{$project:{
dateRange:{$cond:{if:range1, then:"dateRange1",else:{$cond:{if:range2, then:"dateRange2", else:"NotInRange"}}}},
value:1}
},
{$group:{_id:"$dateRange", sum:{$sum:"$value"}}}
)
{ "_id" : "dateRange2", "sum" : 45 }
{ "_id" : "dateRange1", "sum" : 49 }
Substitute your dates for strings in range1 and range2 and optionally you can filter before you start to only operate on documents which are already in the full ranges you are aggregating over.

Related

Aggregation using $sample

With an aggregation using { $sample: { size: 3 } }, I'll get 3 random documents returned.
How can I use a percentage of all documents instead?
Something that'd look like { $sample: { size: 50% } }?
You can not do it, as expression to $sample should be a positive number.
If you still needed to use $sample you can try to get the total count of documents in a collection, get number half of it & then run $sample :
1) Count no.of documents in a collection (mongo Shell) :
var totalDocumentsCount = db.yourCollectionName.count()/2
print(totalDocumentsCount) // Replace it with console.log() in code
2) $sample for random documents :
db.yourCollectionName.aggregate([{$sample : {size : totalDocumentsCount}}])
Note :
If you wanted to get half of the documents from the collection (Which is 50% of documents) then $sample might not be a good option - it can become an inefficient query. Also result of $sample can have duplicate documents being returned (So really you might not get unique 50% of documents). Try to read more about it here : $sample
If someone is looking for this solution in PHP just use this as required in your aggregate at the end ( i.e before projection ) and avoid using limit and sort
[
'$sample' => [
'size' => 30
]
]
Starting in Mongo 4.4, you can use the $sampleRate operator:
// { x: 1 }
// { x: 2 }
// { x: 3 }
// { x: 4 }
// { x: 5 }
// { x: 6 }
db.collection.aggregate([ { $match: { $sampleRate: 0.33 } } ])
// { x: 3 }
// { x: 5 }
This matches a random selection of input documents (33%). The number of documents selected approximates the sample rate expressed as a percentage of the total number of documents.
Note that this is equivalent to adding a random number between 0 and 1 for each document and filtering them in if this random value is bellow 0.33. Such that you may get more or less documents in output, and running this several times won't necessarily give you the same output.

Bucketing and counting for histogram in MongoDB

I want to implement a histogram based on the data stored in MongoDB. I want to get counts based on bucketing. I have to create buckets based on only one input value that is number of groups. for example group = 4
Consider there are multiple transactions are running and we stored transaction time as one of the fields. I want to calculate counts of transactions based on time required to finish the transaction.
How can I use aggregation framework or map reduce to create a bucketing?
Sample data:
{
"transactions": {
"149823": {
"timerequired": 5
},
"168243": {
"timerequired": 4
},
"168244": {
"timerequired": 10
},
"168257": {
"timerequired": 15
},
"168258": {
"timerequired": 8
},
"timerequired": 18
}
}
In the output I want to print bucket size and count of transactions fall into that bucket.
Bucket count
0-5 2
5-10 2
10-15 1
15-20 1
From mongo version 3.4, the functions $bucket and $bucketAuto are available . They can easily solve your request:
db.transactions.aggregate( [
{
$bucketAuto: {
groupBy: "$timerequired",
buckets: 4
}
}
])

What does $sum:1 mean in Mongo

I have a collection foo:
{ "_id" : ObjectId("5837199bcabfd020514c0bae"), "x" : 1 }
{ "_id" : ObjectId("583719a1cabfd020514c0baf"), "x" : 3 }
{ "_id" : ObjectId("583719a6cabfd020514c0bb0") }
I use this query:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:1}}})
Then I get a result:
{ "_id" : 1, "avg" : 2, "sum" : 3 }
What does {$sum:1} mean in this query?
From the official docs:
When used in the $group stage, $sum has the following syntax and returns the collective sum of all the numeric values that result from applying a specified expression to each document in a group of documents that share the same group by key:
{ $sum: < expression > }
Since in your example the expression is 1, it will aggregate a value of one for each document in the group, thus yielding the total number of documents per group.
Basically it will add up the value of expression for each row. In this case since the number of rows is 3 so it will be 1+1+1 =3 . For more details please check mongodb documentation https://docs.mongodb.com/v3.2/reference/operator/aggregation/sum/
For example if the query was:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:$x}}})
then the sum value would be 1+3=4
I'm not sure what MongoDB version was there 6 years ago or whether it had all these goodies, but it seems to stand to reason that {$sum:1} is nothing but a hack for {$count:{}}.
In fact, $sum here is more expensive than $count, as it is being performed as an extra, whereas $count is closer to the engine. And even if you don't give much stock to performance, think of why you're even asking: because that is a less-than-obvious hack.
My option would be:
db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$count:{}}}})
I just tried this on Mongo 5.0.14 and it runs fine.
The good old "Just because you can, doesn't mean you should." is still a thing, no?

How do I calculate the average of top 20 percent of a collection in MongoDB Aggregate?

In a collection like: books : [{ stars: 10, valid: true }, { stars: 24, valid: false }, { stars: 76, valid: true }, ...], is simple calculate average with:
db.books.aggregate([
{ $match : {
valid: true
}},
{ $group : {
_id: null,
avg: { $avg: "$stars" } // <- How calculate $avg of top 20%?
}}
])
But, if I want average of top 20 percent of stars instead of average of all stars?
PS: Without know collection(valid: true) size, because unlike my example, I perform a lot of $unwind
OBS:
> db.version()
2.4.10
You need to fire two queries to achieve this.
Get the total count of stars whose valid attribute is true.
var bookCount = db.books.count({"valid":true});
Calculate the number of records top 20% for which the average needs to be calculated.
var limit = Math.ceil(.2*bookCount);
Perform the aggregation operation:
Match only those records, whose valid attribute is true.
Sort the records based on the stars attribute value, in descending
order, so that the top stars come first.
Limit the top 20% of the records.
Group them and calculate their averages.
The Code:
db.books.aggregate([
{$match:{"valid":true}},
{$sort:{"stars":-1}},
{$limit:limit},
{$group:{"_id":null,"avg":{$avg:"$stars"}}}
])
I perform a lot of $unwind
Your Sample data nor your code reflect this.

mongodb complex map/reduce - or so I think

I have a mongodb collection that contains every sale and looks like this
{_id: '999',
buyer:{city:'Dallas','state':'Texas',...},
products: {...},
order_value:1000,
date:"2011-11-23T11:34:33Z"
}
I need to show stats about order volumes, by state, in the last 30,60 and 90 days.
so, to get something like this
State Last 30 Last 60 Last 90
Arizona 12000 22000 35000
Texas 5000 9000 16000
how would you do this in a single query?
That's not very difficult :
map = function() {
emit({key : this.buyer.state, value : order_value})
}
reduce = function(key,values) {
sum = 0;
values.forEach( function(o) {
sum += o
}
return sum
}
and then you map reduce your collection with query {date : {$gt : { [today minus 30 days] }}
(i d'ont remember the syntax but you should the excellent mapreduce doc on mongodb site).
To make more efficient use of map reduce, think with incremental map reduce by querying first on the last 30 days, then map reduce again (incrementally) filtering -60 to -30 days to get information on the las t60 days. Finally, run incremental map reduce filtering -60 to -90 days to get the last 90 days.
This is not bad because you have 3 queryies but you only recompute aggregation on data you don't have yet.
I can provide example, but you should be able to do it by yourself now.