MongoDB Aggregation: Counting distinct fields - mongodb

I am trying to write an aggregation to identify accounts that use multiple payment sources. Typical data would be.
{
account:"abc",
vendor:"amazon",
}
...
{
account:"abc",
vendor:"overstock",
}
Now, I'd like to produce a list of accounts similar to this
{
account:"abc",
vendorCount:2
}
How would I write this in Mongo's aggregation framework

I figured this out by using the $addToSet and $unwind operators.
Mongodb Aggregation count array/set size
db.collection.aggregate([
{
$group: { _id: { account: '$account' }, vendors: { $addToSet: '$vendor'} }
},
{
$unwind:"$vendors"
},
{
$group: { _id: "$_id", vendorCount: { $sum:1} }
}
]);
Hope it helps someone

I think its better if you execute query like following which will avoid unwind
db.t2.insert({_id:1,account:"abc",vendor:"amazon"});
db.t2.insert({_id:2,account:"abc",vendor:"overstock"});
db.t2.aggregate([
{ $group : { _id : { "account" : "$account", "vendor" : "$vendor" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.account", number : { $sum : 1 } } }
]);
Which will show you following result which is expected.
{ "_id" : "abc", "number" : 2 }

You can use sets
db.test.aggregate([
{$group: {
_id: "$account",
uniqueVendors: {$addToSet: "$vendor"}
}},
{$project: {
_id: 1,
vendorsCount: {$size: "$uniqueVendors"}
}}
]);

I do not see why somebody would have to use $group twice
db.t2.aggregate([ { $group: {"_id":"$account" , "number":{$sum:1}} } ])
This will work perfectly fine.

This approach doesn't make use of $unwind and other extra operations. Plus, this won't affect anything if new things are added into the aggregation. There's a flaw in the accepted answer. If you have other accumulated fields in the $group, it would cause issues in the $unwind stage of the accepted answer.
db.collection.aggregate([{
"$group": {
"_id": "$account",
"vendors": {"$addToSet": "$vendor"}
}
},
{
"$addFields": {
"vendorCount": {
"$size": "$vendors"
}
}
}])

To identify accounts that use multiple payment sources:
Use grouping to count data from multiple account records and group the result by account with count
Use a match case is to filter only such accounts having more than one payment method
db.payment_collection.aggregate([ { $group: {"_id":"$account" ,
"number":{$sum:1}} }, {
"$match": {
"number": { "$gt": 1 }
}
} ])
This will work perfectly fine,

db.UserModule.aggregate(
{ $group : { _id : { "companyauthemail" : "$companyauthemail", "email" : "$email" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.companyauthemail", number : { $sum : 1 } } }
);

An example
db.collection.distinct("example.item").forEach( function(docs) {
print(docs + "==>>" + db.collection.count({"example.item":docs}))
});

Related

Select latest document after grouping them by a field in MongoDB

I got a question that I would expect to be pretty simple, but I cannot figure it out. What I want to do is this:
Find all documents in a collection and:
sort the documents by a certain date field
apply distinct on one of its other fields, but return the whole document
Best shown in an example.
This is a mock input:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("1998-11-04T18:46:14.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("1970-05-09T20:16:37.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
The expected output is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Or, in other words:
Group the input data by the commandName field
Inside each group sort the documents
Return the newest document from each group
My attempts to write this query have failed:
The distinct() function will only return the value of the field I am distinct-ing on, not the whole document. That makes it unsuitable for my case.
Tried writing an aggregate query, but ran into an issue of how to sort-and-select a single document from inside of each group? The sort aggreation stage will sort the groups among one other, which is not what I want.
I am not too well-versed in Mongo and this is where I hit a wall. Any ideas on how to continue?
For reference, this is the work-in-progress aggregation query I am trying to expand on:
db.getCollection('some_collection').aggregate([
{ $group: { '_id': '$commandName', 'docs': {$addToSet: '$$ROOT'} } },
{ $sort: {'_id.docs.???': 1}}
])
Post-resolved edit
Thank you for the answers. I got what I needed. For future reference, this is the full query that will do what was requested and also return a list of the filtered documents, not groups.
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': 1}},
{ $group: { '_id': '$commandName', 'result': { $last: '$$ROOT'} } },
{ $replaceRoot: {newRoot: '$result'} }
])
The query result without the $replaceRoot stage would be:
[
{
"_id": "migration_a",
"result": {
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
}
},
{
"_id": "migration_b",
"result": {
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
}
]
The outer _id and _result are just "group-wrappers" around the actual document I want, which is nested under the result key. Moving the nested document to the root of the result is done using the $replaceRoot stage. The query result when using that stage is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Try this:
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': -1}},
{ $group: { '_id': '$commandName', 'doc': {$first: '$$ROOT'} } }
])
I believe this will result in what you're looking for:
db.collection.aggregate([
{
$group: {
"_id": "$commandName",
"executionDate": {
"$last": "$executionDate"
}
}
}
])
You can check it out here
Of course, if you want to match your expected output exactly, you can add a sort (this may not be necessary since your goal is to simply return the newest document from each group):
{
$sort: {
"executionDate": 1
}
}
You can check this version out here.
The use-case the question presents is nearly covered in the $last aggregation operator documentation.
Which summarises:
the $group stage should follow a $sort stage to have the input
documents in a defined order. Since $last simply picks the last
document from a group.
Query: Link
db.collection.aggregate([
{
$sort: {
executionDate: 1
}
},
{
$group: {
_id: "$commandName",
executionDate: {
$last: "$executionDate"
}
}
}
]);

MongoDB: Create Object in Aggregation result

I want to return Object as a field in my Aggregation result similar to the solution in this question. However in the solution mentioned above, the Aggregation results in an Array of Objects with just one item in that array, not a standalone Object. For example, a query like the following with a $push operation
$group:{
_id: "$publisherId",
'values' : { $push:{
newCount: { $sum: "$newField" },
oldCount: { $sum: "$oldField" } }
}
}
returns a result like this
{
"_id" : 2,
"values" : [
{
"newCount" : 100,
"oldCount" : 200
}
]
}
}
not one like this
{
"_id" : 2,
"values" : {
"newCount" : 100,
"oldCount" : 200
}
}
}
The latter is the result that I require. So how do I rewrite the query to get a result like that? Is it possible or is the former result the best I can get?
You don't need the $push operator, just add a final $project pipeline that will create the embedded document. Follow this guideline:
var pipeline = [
{
"$group": {
"_id": "$publisherId",
"newCount": { "$sum": "$newField" },
"oldCount": { "$sum": "$oldField" }
}
},
{
"$project" {
"values": {
"newCount": "$newCount",
"oldCount": "$oldCount"
}
}
}
];
db.collection.aggregate(pipeline);

Return all fields MongoDB Aggregate

I tried searching on here but couldn't really find what I need. I have documents like this:
{
appletype:Granny,
color:Green,
datePicked:2015-01-26,
dateRipe:2015-01-24,
numPicked:3
},
{
appletype:Granny,
color:Green,
datePicked:2015-01-01,
dateRipe:2014-12-28,
numPicked:6
}
I would like to return only those apples picked latest, will all fields. I want my query to return me the first document only essentially. When I try to do:
db.collection.aggregate([
{ $match : { "appletype" : "Granny" } },
{ $sort : { "datePicked" : 1 } },
{ $group : { "_id" : { "appletype" : "$appletype" },
"datePicked" : { $max : "$datePicked" } },
])
It does return me all the apples picked latest, however with only appletype:Granny and datePicked:2015-01-26. I need the remaining fields. I tries using $project and adding all the fields, but it didn't get me what I needed. Also, when I added the other fields to the group, since datePicked is unique, it returned both records.
How can I go about returning all fields, for only the latest datePicked?
Thanks!
From your description, it sounds like you want one document for each of the types of apple in your collection and showing the document with the most recent datePicked value.
Here is an aggregate query for that:
db.collection.aggregate([
{ $sort: { "datePicked": -1 },
{ $group: { _id: "$appletype", color: { $first: "$color" }, datePicked: { $first: "$datePicked" }, dateRipe: { $first: "$dateRipe" }, numPicked: { $first: "$numPicked" } } },
{ $project: { _id: 0, color: 1, datePicked: 1, dateRipe: 1, numPicked: 1, appletype: "$_id" } }
])
But then based on the aggregate query you've written, it looks like you're trying to get this:
db.collection.find({appletype: "Granny"}).sort({datePicked: -1}).limit(1);

MongoDB query using aggregation not returning expected results

I have a few documents that look like this example:
{
"_id": ObjectId("540f4b6496f35c16af001dc4"),
"groups": [
1,
46105,
46106,
53241,
55397,
55406,
62840
],
"vehicleid": 123,
"vehiclename": "123 - CAN BC",
"totaldistancetraveled": 472.0,
"date_num": 20140901
}
I need to find the total distance driven by all vehicles that belong to group 46105 and where theie date_num matches with 20140901.
I tried the following aggregation query:
db.vehicle_performance_monthly.aggregate(
{ $unwind : "$groups"},
{$group:
{_id: "$groups",
totalMiles: { $sum: "$totaldistancetraveled"}}},
{$match:{_id: {$in:[46106]}},{"$date_num":{$in:20140901}}}
)
But multiple matches are not being returned. Any help is appreciated.
This should work.
db.vehicle_performance_monthly.aggregate([ {
$match : {
groups : 46106,
date_num : 20140901
}
}, {
$unwind : "$groups"
}, {
$match : {
groups : 46106
}
}, {
$group : {
_id : "$groups",
totalMiles : {
$sum : "$totaldistancetraveled"
}
}
} ]);
Analysis for your original answer:
db.vehicle_performance_monthly.aggregate(
{ $unwind : "$groups"},
{$group:
{_id: "$groups",
totalMiles: { $sum: "$totaldistancetraveled"}}}, // $group doesn't map "date_name" then it will lost.
{$match:{_id: {$in:[46106]}},{"$date_num":{$in:20140901}}} // syntax error: {$match:{_id: {$in:[46106]}},{"$date_num":{$in:20140901}}} should be {$match:{_id: {$in:[46106]},"$date_num":{$in:[20140901]}}}
)
$match first to improve performance

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.