Aggregation in MongoDB, using unwind - mongodb

I need to aggregate all tags from records like this:
https://gist.github.com/sbassi/5642925
(there are 2 sample records in this snippet) and sort them by size (first the tag that appears with more frequency). But I don't want to take into account data that have specific "user_id" (lets say, 2,3,6 and 12).
Here is my try (just the aggregation, without filtering and sorting):
db.user_library.aggregate( { $unwind : "$annotations.data.tags" }, {
$group : { _id : "$annotations.data.tags" ,totalTag : { $sum : 1 } } }
)
And I got:
{ "result" : [ ], "ok" : 1 }

Right now you can't unwind an array that is nested inside another array. See SERVER-6436
Consider structuring the data differently, having an array field with all tags for that document or possibly unwinding annotations and then unwinding annotations.data.tags in a stacked unwind like this:
db.user_library.aggregate([
{ $project: { 'annotations.data.tags': 1 } },
{ $unwind: '$annotations' },
{ $unwind: '$annotations.data.tags' },
{ $group: { _id: '$annotations.data.tags', totalTag: { $sum: 1 } } }
])

Related

MongoDB - count by field, and sort by count

I am new to MongoDB, and new to making more than super basic queries and i didn't succeed to create a query that does as follows:
I have such collection, each document represents one "use" of a benefit (e.g first row states the benefit "123" was used once):
[
{
"id" : "1111",
"benefit_id":"123"
},
{
"id":"2222",
"benefit_id":"456"
},
{
"id":"3333",
"benefit_id":"456"
},
{
"id":"4444",
"benefit_id":"789"
}
]
I need to create q query that output an array. at the top is the most top used benefit and how many times is was used.
for the above example the query should output:
[
{
"benefit_id":"456",
"cnt":2
},
{
"benefit_id":"123",
"cnt": 1
},
{
"benefit_id":"789",
"cnt":1
}
]
I have tried to work with the documentation and with $sortByCount but with no success.
$group
$group by benefit_id and get count using $sum
$sort by count descending order
db.collection.aggregate([
{
$group: {
_id: "$benefit_id",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Playground
$sortByCount
Same operation using $sortByCount operator
db.collection.aggregate([
{ $sortByCount: "$benefit_id" }
])
Playground

Select latest document after grouping them by a field in MongoDB

I got a question that I would expect to be pretty simple, but I cannot figure it out. What I want to do is this:
Find all documents in a collection and:
sort the documents by a certain date field
apply distinct on one of its other fields, but return the whole document
Best shown in an example.
This is a mock input:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("1998-11-04T18:46:14.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("1970-05-09T20:16:37.000Z")
},
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
The expected output is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Or, in other words:
Group the input data by the commandName field
Inside each group sort the documents
Return the newest document from each group
My attempts to write this query have failed:
The distinct() function will only return the value of the field I am distinct-ing on, not the whole document. That makes it unsuitable for my case.
Tried writing an aggregate query, but ran into an issue of how to sort-and-select a single document from inside of each group? The sort aggreation stage will sort the groups among one other, which is not what I want.
I am not too well-versed in Mongo and this is where I hit a wall. Any ideas on how to continue?
For reference, this is the work-in-progress aggregation query I am trying to expand on:
db.getCollection('some_collection').aggregate([
{ $group: { '_id': '$commandName', 'docs': {$addToSet: '$$ROOT'} } },
{ $sort: {'_id.docs.???': 1}}
])
Post-resolved edit
Thank you for the answers. I got what I needed. For future reference, this is the full query that will do what was requested and also return a list of the filtered documents, not groups.
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': 1}},
{ $group: { '_id': '$commandName', 'result': { $last: '$$ROOT'} } },
{ $replaceRoot: {newRoot: '$result'} }
])
The query result without the $replaceRoot stage would be:
[
{
"_id": "migration_a",
"result": {
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
}
},
{
"_id": "migration_b",
"result": {
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
}
]
The outer _id and _result are just "group-wrappers" around the actual document I want, which is nested under the result key. Moving the nested document to the root of the result is done using the $replaceRoot stage. The query result when using that stage is:
[
{
"commandName" : "migration_a",
"executionDate" : ISODate("2005-11-08T11:58:52.000Z")
},
{
"commandName" : "migration_b",
"executionDate" : ISODate("2016-06-02T19:48:34.000Z")
}
]
Try this:
db.getCollection('some_collection').aggregate([
{ $sort: {'executionDate': -1}},
{ $group: { '_id': '$commandName', 'doc': {$first: '$$ROOT'} } }
])
I believe this will result in what you're looking for:
db.collection.aggregate([
{
$group: {
"_id": "$commandName",
"executionDate": {
"$last": "$executionDate"
}
}
}
])
You can check it out here
Of course, if you want to match your expected output exactly, you can add a sort (this may not be necessary since your goal is to simply return the newest document from each group):
{
$sort: {
"executionDate": 1
}
}
You can check this version out here.
The use-case the question presents is nearly covered in the $last aggregation operator documentation.
Which summarises:
the $group stage should follow a $sort stage to have the input
documents in a defined order. Since $last simply picks the last
document from a group.
Query: Link
db.collection.aggregate([
{
$sort: {
executionDate: 1
}
},
{
$group: {
_id: "$commandName",
executionDate: {
$last: "$executionDate"
}
}
}
]);

Return all fields MongoDB Aggregate

I tried searching on here but couldn't really find what I need. I have documents like this:
{
appletype:Granny,
color:Green,
datePicked:2015-01-26,
dateRipe:2015-01-24,
numPicked:3
},
{
appletype:Granny,
color:Green,
datePicked:2015-01-01,
dateRipe:2014-12-28,
numPicked:6
}
I would like to return only those apples picked latest, will all fields. I want my query to return me the first document only essentially. When I try to do:
db.collection.aggregate([
{ $match : { "appletype" : "Granny" } },
{ $sort : { "datePicked" : 1 } },
{ $group : { "_id" : { "appletype" : "$appletype" },
"datePicked" : { $max : "$datePicked" } },
])
It does return me all the apples picked latest, however with only appletype:Granny and datePicked:2015-01-26. I need the remaining fields. I tries using $project and adding all the fields, but it didn't get me what I needed. Also, when I added the other fields to the group, since datePicked is unique, it returned both records.
How can I go about returning all fields, for only the latest datePicked?
Thanks!
From your description, it sounds like you want one document for each of the types of apple in your collection and showing the document with the most recent datePicked value.
Here is an aggregate query for that:
db.collection.aggregate([
{ $sort: { "datePicked": -1 },
{ $group: { _id: "$appletype", color: { $first: "$color" }, datePicked: { $first: "$datePicked" }, dateRipe: { $first: "$dateRipe" }, numPicked: { $first: "$numPicked" } } },
{ $project: { _id: 0, color: 1, datePicked: 1, dateRipe: 1, numPicked: 1, appletype: "$_id" } }
])
But then based on the aggregate query you've written, it looks like you're trying to get this:
db.collection.find({appletype: "Granny"}).sort({datePicked: -1}).limit(1);

MongoDB Aggregation: Counting distinct fields

I am trying to write an aggregation to identify accounts that use multiple payment sources. Typical data would be.
{
account:"abc",
vendor:"amazon",
}
...
{
account:"abc",
vendor:"overstock",
}
Now, I'd like to produce a list of accounts similar to this
{
account:"abc",
vendorCount:2
}
How would I write this in Mongo's aggregation framework
I figured this out by using the $addToSet and $unwind operators.
Mongodb Aggregation count array/set size
db.collection.aggregate([
{
$group: { _id: { account: '$account' }, vendors: { $addToSet: '$vendor'} }
},
{
$unwind:"$vendors"
},
{
$group: { _id: "$_id", vendorCount: { $sum:1} }
}
]);
Hope it helps someone
I think its better if you execute query like following which will avoid unwind
db.t2.insert({_id:1,account:"abc",vendor:"amazon"});
db.t2.insert({_id:2,account:"abc",vendor:"overstock"});
db.t2.aggregate([
{ $group : { _id : { "account" : "$account", "vendor" : "$vendor" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.account", number : { $sum : 1 } } }
]);
Which will show you following result which is expected.
{ "_id" : "abc", "number" : 2 }
You can use sets
db.test.aggregate([
{$group: {
_id: "$account",
uniqueVendors: {$addToSet: "$vendor"}
}},
{$project: {
_id: 1,
vendorsCount: {$size: "$uniqueVendors"}
}}
]);
I do not see why somebody would have to use $group twice
db.t2.aggregate([ { $group: {"_id":"$account" , "number":{$sum:1}} } ])
This will work perfectly fine.
This approach doesn't make use of $unwind and other extra operations. Plus, this won't affect anything if new things are added into the aggregation. There's a flaw in the accepted answer. If you have other accumulated fields in the $group, it would cause issues in the $unwind stage of the accepted answer.
db.collection.aggregate([{
"$group": {
"_id": "$account",
"vendors": {"$addToSet": "$vendor"}
}
},
{
"$addFields": {
"vendorCount": {
"$size": "$vendors"
}
}
}])
To identify accounts that use multiple payment sources:
Use grouping to count data from multiple account records and group the result by account with count
Use a match case is to filter only such accounts having more than one payment method
db.payment_collection.aggregate([ { $group: {"_id":"$account" ,
"number":{$sum:1}} }, {
"$match": {
"number": { "$gt": 1 }
}
} ])
This will work perfectly fine,
db.UserModule.aggregate(
{ $group : { _id : { "companyauthemail" : "$companyauthemail", "email" : "$email" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.companyauthemail", number : { $sum : 1 } } }
);
An example
db.collection.distinct("example.item").forEach( function(docs) {
print(docs + "==>>" + db.collection.count({"example.item":docs}))
});

Retrieving the length of a list in MongoDB

I'm trying to do a mongo query where I get the length of an array in each document, without retrieving the full contents of the list. Ideally, this would be a projection option along these lines:
db.log.find({},{entries:{$length: 1}})
but this isn't supported. Maybe this is possible in an elegant way with the new aggregation framework? What I've come up with is this:
db.log.find({},{"entries.length": 1})
Which returns results like this:
{ "_id" : ObjectId("50d2fb07e64cfa55431de693"), "entries" : [ { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { }, { } ] }
This is ugly but basically serves my needs since I can count the length of this list without the network weight of getting the full contents. But I have no idea why this works. What is this query actually doing?
Now, I could think in two approachs:
1) Using aggregation framework:
db.log.aggregate([ { $unwind : "$entries" }, { $group : { _id : "$_id", entries : {$sum:1} } } ]);
2) Or you can add a field to the document that holds the entries count. So, each time that you push a new value to entries array, you must increment the counter. The update will be like this:
db.log.update({ _id : 123 }, { $push : { entries : 'value' }, $inc : { entriesCount : 1 } })
Clearly, you have a trade-off here: the aggregation framework is too expensive for this simple operation. But adding a field to document, every update should increment the counter.
IMHO, the counter looks more reasonable, though it looks a workaround.
According to the mongodb documentation:
You can use as well $size:
db.log.aggregate([{$project:{'_id':1, 'count':{$size: "$entriesCount"}}}]);