I'd like to get percentages from a group pipeline in a MongoDB aggregate.
My data:
{
_id : 1,
name : 'hello',
type : 'big'
},
{
_id : 2,
name : 'bonjour',
type : 'big'
},
{
_id : 3,
name : 'hi',
type : 'short'
},
{
_id : 4,
name : 'salut',
type : 'short'
},
{
_id : 5,
name : 'ola',
type : 'short'
}
My request group by type, and count:
[{
$group : {
_id : {
type : '$type'
},
"count" : {
"$sum" : 1
}
}
}]
Result:
[
{
_id {
type : 'big',
},
count : 2
},
{
_id {
type : 'short',
},
count : 3
}
]
But I'd like to have count AND percentage, like that:
[
{
_id {
type : 'big',
},
count: 2,
percentage: 40%
},
{
_id {
type : 'short',
},
count: 3,
percentage: 60%
}
]
But I've no idea how to do that. I've tried $divide and other things, but without success. Could you please help me?
Well I think percentage should be string if the value contains %
First get you will need to count the number of document.
var nums = db.collection.count();
db.collection.aggregate(
[
{ "$group": { "_id": {"type": "$type"}, "count": { "$sum": 1 }}},
{ "$project": {
"count": 1,
"percentage": {
"$concat": [ { "$substr": [ { "$multiply": [ { "$divide": [ "$count", {"$literal": nums }] }, 100 ] }, 0,2 ] }, "", "%" ]}
}
}
]
)
Result
{ "_id" : { "type" : "short" }, "count" : 3, "percentage" : "60%" }
{ "_id" : { "type" : "big" }, "count" : 2, "percentage" : "40%" }
First find total number of documents in collections using count method and used that count variable to calculate percentage in aggregation like this :
var totalDocument = db.collectionName.count() //count total doc.
used totalDocument in aggregation as below :
db.collectionName.aggregate({"$group":{"_id":{"type":"$type"},"count":{"$sum":1}}},
{"$project":{"count":1,"percentage":{"$multiply":[{"$divide":[100,totalDocument]},"$count"]}}})
EDIT
If you need to this in single aggregation query then unwind used in aggregation but using unwind it creates Cartesian problem check below aggregation query :
db.collectionName.aggregate({"$group":{"_id":null,"count":{"$sum":1},"data":{"$push":"$$ROOT"}}},
{"$unwind":"$data"},
{"$group":{"_id":{"type":"$data.type"},"count":{"$sum":1},
"total":{"$first":"$count"}}},
{"$project":{"count":1,"percentage":{"$multiply":[{"$divide":[100,"$total"]},"$count"]}}}
).pretty()
I recconmed first find out toatal count and used that count in aggregation as per first query.
Related
Below is the document which has an array name datum and I want to filter the records based on StatusCode, group by Year and sum the amount value from the recent record of distinct Types.
{
"_id" : ObjectId("5fce46ca6ac9808276dfeb8c"),
"year" : 2018,
"datum" : [
{
"StatusCode" : "A",
"Type" : "1",
"Amount" : NumberDecimal("100"),
"Date" : ISODate("2018-05-30T00:46:12.784Z")
},
{
"StatusCode" : "A",
"Type" : "1",
"Amount" : NumberDecimal("300"),
"Date" : ISODate("2023-05-30T00:46:12.784Z")
},
{
"StatusCode" : "A",
"Type" : "2",
"Amount" : NumberDecimal("420"),
"Date" : ISODate("2032-05-30T00:46:12.784Z")
},
{
"StatusCode" : "B",
"Type" : "2",
"Amount" : NumberDecimal("420"),
"Date" : ISODate("2032-05-30T00:46:12.784Z")
}
]
}
In my case following is the expected result :
{
Total : 720
}
I want to achieve the result in the following aggregate Query pattern
db.collection.aggregate([
{
$addFields: {
datum: {
$reduce: {
input: "$datum",
initialValue: {},
"in": {
$cond: [
{
$and: [
{ $in: ["$$this.StatusCode", ["A"]] }
]
},
"$$this",
"$$value"
]
}
}
}
}
},
{
$group: {
_id: "$year",
RecentValue: { $sum: "$datum.Amount" }
}
}
])
You can first $unwind the datum array. Do the filtering and sort by the date. Then get the record with latest datum by a $group. Finally do another $group to calculate the sum.
Here is a mongo playground for your reference.
I want to aggregate my data and make an array with multiple stored date, grouped by user and day of week and for this day, something like for this data (according we are february, the 24th) :
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T22:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T23:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 2,
"heure" : ISODate("2017-02-24T22:34:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-25T07:21:27.858Z")
}
Get this :
{
"_id" : {user : 1, jour : 55}
"date" : [ISODate("2017-02-24T22:33:27.858Z"), ISODate("2017-02-24T23:33:27.858Z") ]
}
{
"_id" : {user : 2, jour : 55}
"date" : [ISODate("2017-02-24T22:34:27.858Z") ]
}
I tried using $push of $match, but everything failed.
Optionally, i want to have the time beetween time two date, like for user 1, adding another field which contains 1 hours. But i don't wan't to use ate at most once, so with 4 date in array, i need to have only a addition : the value of first and second with the value of third and fourth. I want to see this to learn how to use the $cond properly
Here is my actual pipeline :
[
{ $match : {$eq : [{$dayOfYear : "$heure"}, {$dayOfYear : ISODate()}] }
{
$group : {
_id : {
user : "$user",
},
date : {$push: "$heure"},
nombre: { $sum : 1 }
}
}
]
For now, i don't handle the second part of the aggregate function
For the first filter part you need to use $redact pipeline as it will return all documents that match the condition with the $$KEEP system variable returned by $cond based on the $dayOfYear date operator and discards documents otherwise with $$PRUNE.
Consider composing your final aggregate pipeline as:
[
{
"$redact": {
"$cond": [
{
"$eq": [
{ "$dayOfYear": "$heure" },
{ "$dayOfYear": new Date() }
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$group": {
"_id": {
"user": "$user",
"jour": { "$dayOfYear": "$heure" }
},
"date": { "$push": "$heure" },
"nombre": { "$sum": 1 }
}
}
]
I want to project all the objects from an array, if it matches the given condition.
I have following data
{
_id : 1,
em : 'abc#12s.net',
name : 'NewName',
od :
[
{
"oid" : ObjectId("1234"),
"ca" : ISODate("2016-05-05T13:20:10.718Z")
},
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
},
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-13T13:20:10.718Z")
}
]
},
{
_id : 2,
em : 'ab6c#xyz.net',
name : 'NewName2',
od :
[
{
"oid" : ObjectId("1234"),
"ca" : ISODate("2016-05-11T13:20:10.718Z")
},
{
"oid" : ObjectId("2345"),
"ca" : ISODate("2016-05-12T13:20:10.718Z")
},
{
"oid" : ObjectId("57766"),
"ca" : ISODate("2016-05-05T13:20:10.718Z")
}
]
}
I want to get all the objects from od array, if 'od.ca' comes between range say, if greater than 10th may and less than 15th may.
I tried using aggregate method of mongodb and I am new to this method. My query is as given below.
db.userDetail.aggregate(
{
$match:
{
'od.ca':
{
'$gte': '10/05/2016',
'$lte': '15/05/2016'
},
lo: { '$ne': 'd' }
}
},
{
$redact:
{
$cond:
{
if:
{
$gte: [ "$$od.ca", '10/05/2016' ],
$lte : ["$$od.ca" , '15/05/2016']
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
})
When I am trying to use this command, getting error :-
assert: command failed: {
"errmsg" : "exception: Use of undefined variable: od",
"code" : 17276,
"ok" : 0
} : aggregate failed
Since I am using mongodb 3.0.0 I can not use $fiter. So I tried using $redact.
Can someone tell me what wrong I am doing? Is the query correct?
Also referred question Since I am not using 3.2 of mongodb (as I have mentioned), can not use the accepted answer of the question.
query explanation:
$match - match documents for criteria - limit documents to process
$unwind - econstructs od array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
$match - match documents for criteria
$group - this is opposite of $unwind in our case - so we are recreating results array
If you are expecting document like this:
{
"_id" : 2,
"od" : [{
"oid" : 1234,
"ca" : ISODate("2016-05-11T13:20:10.718Z")
}, {
"oid" : 2345,
"ca" : ISODate("2016-05-12T13:20:10.718Z")
}
]
}, {
"_id" : 1,
"od" : [{
"oid" : 2345,
"ca" : ISODate("2016-05-11T13:20:10.718Z")
}, {
"oid" : 57766,
"ca" : ISODate("2016-05-13T13:20:10.718Z")
}
]
}
you can use query bellow:
db.userDetail.aggregate([{
$match : {
"od.ca" : {
$lt : new Date(new Date().setDate(new Date().getDate() + 2)),
$gte : new Date(new Date().setDate(new Date().getDate() - 4))
}
}
}, {
$unwind : "$od"
}, {
$match : {
"od.ca" : {
$lt : new Date(new Date().setDate(new Date().getDate() + 2)),
$gte : new Date(new Date().setDate(new Date().getDate() - 4))
}
}
}, {
$group : {
_id : "$_id",
od : {
$push : "$od"
}
}
}
])
The following query gave me the desired result.
If you are using mongodb-2.6.X up to 3.0.X can use this solution.
var object = {st : "10/05/2016", et : "13/05/2016"};
db.userDetail.aggregate(
[
{
$match:
{
"od.ca":
{
'$gte': new Date(object.st),
'$lte': new Date(object.et)
},
"lo" : {$ne : 'd'}
}
},
{
$project:
{
em: 1,
fna : 1,
lna : 1,
ca :1,
od:
{
"$setDifference":
[{
"$map":
{
"input": "$od",
"as": "o",
"in":
{
"$cond":[
{
"$and":
[
{ "$gte": [ "$$o.ca", new Date(object.st) ] },
{ "$lte": [ "$$o.ca", new Date(object.et) ] },
{ "$ne": [ "$$o.oid", ObjectID(config.pid.toString())
] }
]
},
"$$o",false]
}
}
},[false]
]
}
}
},
{$sort : {_id : 1}}
];
)
If you are using 3.2.X, use $filter to get the result.
I'm looking for a way to take data such as this
{ "_id" : 5, "count" : 1, "arr" : [ "aga", "dd", "a" ] },
{ "_id" : 6, "count" : 4, "arr" : [ "aga", "ysdf" ] },
{ "_id" : 7, "count" : 4, "arr" : [ "sad", "aga" ] }
I would like to sum the count based on the 1st item(index) of arr. In another aggregation I would like to do the same with the 1st and the 2nd item in the arr array.
I've tried using unwind, but that breaks up the data and the hierarchy is then lost.
I've also tried using
$group: {
_id: {
arr_0:'$arr.0'
},
total:{
$sum: '$count'
}
}
but the result is blank arrays
Actually you can't use the dot notation to group your documents by element at a specified index. To two that you have two options:
First the optimal way using the $arrayElemAt operator new in MongoDB 3.2. which return the element at a specified index in the array.
db.collection.aggregate([
{ "$group": {
"_id": { "$arrayElemAt": [ "$arr", 0 ] },
"count": { "$sum": 1 }
}}
])
From MongoDB version 3.0 backward you will need to de-normalise your array then in the first time $group by _id and use the $first operator to return the first item in the array. From there you will need to regroup your document using that value and use the $sum to get the sum. But this will only work for the first and last index because MongoDB also provides the $last operator.
db.collection.aggregate([
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$first": "$arr" }
}},
{ "$group": {
"_id": "$arr",
"count": { "$sum": 1 }
}}
])
which yields something like this:
{ "_id" : "sad", "count" : 1 }
{ "_id" : "aga", "count" : 2 }
To group using element at position p in your array you will get a better chance using the mapReduce function.
var mapFunction = function(){ emit(this.arr[0], 1); };
var reduceFunction = function(key, value) { return Array.sum(value); };
db.collection.mapReduce(mapFunction, reduceFunction, { "out": { "inline": 1 } } )
Which returns:
{
"results" : [
{
"_id" : "aga",
"value" : 2
},
{
"_id" : "sad",
"value" : 1
}
],
"timeMillis" : 27,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}
Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)