how to use mongodb aggregate and retrieve entire documents - mongodb

I am seriously baffled by mongodb's aggregate function. All I want is to find the newest document in my collection. Let's say each record has a field "created"
db.collection.aggregate({
$group: {
_id:0,
'id':{$first:"$_id"},
'max':{$max:"$created"}
}
})
yields the correct result, but I want the entire document in the result? How would I do that?
This is the structure of the document:
{
"_id" : ObjectId("52310da847cf343c8c000093"),
"created" : 1389073358,
"image" : ObjectId("52cb93dd47cf348786d63af2"),
"images" : [
ObjectId("52cb93dd47cf348786d63af2"),
ObjectId("52f67c8447cf343509d63af2")
],
"organization" : ObjectId("522949d347cf3402c3000001"),
"published" : 1392601521,
"status" : "PUBLISHED",
"tags" : [ ],
"updated" : 1392601521,
"user_id" : ObjectId("52214ce847cf344902000000")
}

In the documentation i found that the $$ROOT expression addresses this problem.
From the DOC:
http://docs.mongodb.org/manual/reference/operator/aggregation/group/#group-documents-by-author

query = [
{
'$sort': {
'created': -1
}
},
{
$group: {
'_id':null,
'max':{'$first':"$$ROOT"}
}
}
]
db.collection.aggregate(query)

db.collection.aggregate([
{
$group: {
'_id':"$_id",
'otherFields':{ $push: { fields: $ROOT } }
}
}
])

I think I figured it out. For example, I have a collection containing an array of images (or pointers). Now I want to find the document with the most images
results=[];
db.collection.aggregate([
{$unwind: "$images"},
{$group:{_id:"$_id", 'imagecount':{$sum:1}}},
{$group:{_id:"$_id",'max':{$max: "$imagecount"}}},
{$sort:{max:-1}},
{$group:{_id:0,'id':{$first:'$_id'},'max':{$first:"$max"}}}
]).result.forEach(function(d){
results.push(db.stories.findOne({_id:d.id}));
});
now the final array will contain the document with the most images. Since images is an array, I use $unwind, I then group by document id and $sum:1, pipe that into a $group that finds the max, pipe it into reverse $sort for max and $group out the first result. Finally I fetchOne the document and push it into the results array.

You should be using db.collection.find() rather than db.collection.aggregate():
db.collection.find().sort({"created":-1}).limit(1)

Related

How to extract grouped results from array in $group stage and return as separate fields?

I'm running an aggregation query, and the $group stage is as follows
$group:
{
_id:
{
year_month: { $dateToString: { "date": "$updated_at", "format": "%Y-%m" } }
,client_name: "$clients_docs.client_name"
,client_label: "$clients_docs.client_label"
,client_code: "$clients_docs.client_code"
,client_country: "$clients_docs.client_country"
,base_curr: "$clients_docs.client_base_currency"
,inv_curr: "$clients_docs.client_invoice_currency"
,dest_curr: "$store.destination_currency"
}
,total_vol: { $sum: "$USD_Value" }
,total_tran: { $sum: 1 }
}
It returns the correct results, and returns all the grouped results in the _id:{} array.
I now want to extract all those fields from the array and return them not within the array so I can more easily export the output to a spreadsheet.
I tried using this stage:
{
$project:
{
year_month: 1
,client_name: 1
,client_label: 1
,client_code: 1
,client_country: 1
,base_curr: 1
,inv_curr: 1
,dest_curr: 1
,total_vol: 1
,total_tran : 1
}
},
But that returned the same results as the $group stage:
{
"_id" : {
"year_month" : "2022-01",
"client_name" : "client A",
"client_label" : "client A",
"client_code" : NumberInt(0000),
"client_country" : "TH",
"base_curr" : "USD",
"inv_curr" : "USD",
"dest_curr" : "HKD"
},
"total_vol" : 100000,
"total_tran" : 100.0
}
I want the "year_month" through "dest_curr" fields at the same level as the "total_vol" and "total_tran", so that when the data is exported they all appear as separate columns (now it's all captured as one "_id" column, and a "total_vol" and "total_tran" column). What's the best way to do this?
From a terminology perspective, you currently have an embedded document (or nested fields) rather than an array.
The straightforward way to do this is to simply enumerate each field, eg:
"year_month": "$_id.year_month",
There are fancier ways to do this, but as you only have a handful of fields this should suffice. Working playground example here.
Edit
An alternative ("fancier") approach is to leverage the $replaceWith stage using the $mergeObjects operator inside of it. Then you can $unset the previous _id field afterwards. It would look like this:
db.collection.aggregate([
{
"$replaceWith": {
"$mergeObjects": [
"$$ROOT",
"$_id"
]
}
},
{
$unset: "_id"
}
])
Playground link here
I also fixed the earlier playground link that had a typo for the client_label field.

MongoDB get document nested array by id

I have a processes document with a nested attachments array, I want to return get the fileName with the processes _id and attachment _id.
I have tried many options, my latest attempt still return all items from the attachments array. I only want the attachment that matches the attachment id past in.
db.getCollection('processes').find(
{$and: [ { "_id" : ObjectId("5a9455d7854cd987a40b1ba4") },
{ "attachments._id" : ObjectId("5a983da6201ba5a2302fb38f") }]},
{'attachments._id': 1, 'attachments.fileName': 1}
)
Any suggestion is greatly appreciated, thanks!
You can use $elemMatch in projection to get only one filtered subdocument from nested array:
db.getCollection('processes').find(
{ "_id" : ObjectId("5a9455d7854cd987a40b1ba4") },
{ attachments: { $elemMatch: { _id: ObjectId("5a983da6201ba5a2302fb38f") } } } )

MongoDB aggregation/map-reduce

I'm new to MongoDB and I need to do an aggregation which seems to me quite difficult. A document looks something like this
{
"_id" : ObjectId("568192aef8bd6b0cd0f649c6"),
"conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing",
"prism:aggregationType" : "Conference Proceeding",
"children-id" : [
"SCOPUS_ID:84948148564",
"SCOPUS_ID:84927603733",
"SCOPUS_ID:84943521758",
"SCOPUS_ID:84905234683",
"SCOPUS_ID:84876113709"
],
"dc:identifier" : "SCOPUS_ID:84867598678"
}
The example contains just the fields I need in the aggregation. Prism:aggregationType can have 5 different values(conference proceeding, book, journal etc.). Children-id says that this document is cited by an array of other documents(SCOPUS_ID is an unique ID for each document).
What I want to do is to group first by conference, then for each conference I want to know for each prism:aggregationType how many citing documents are($gt > 0).
For example, lets say there are 100 documents that have the conference from above. These 100 documents are cited by 250 documents. I want to know from all of these 250 documents how many have "prism:aggregationType" : "Conference Proceeding", "prism:aggregationType" : "Journal" etc.
An output could look like this:
{
"conference" : "IEEE International Conference on Acoustics, Speech and Signal Processing",
"aggregationTypes" : [{"Conference Proceeding" : 50} , {"Journal" : 200}]
}
It is not important if it is done with aggregation pipeline or map-reduce.
EDIT
Is there any way to combine these 2 into one aggregation:
db.articles.aggregate([
{ $match:{
conference : {$ne : null}
}},
{$unwind:'$children-id'},
{$group: {
_id: {conference: '$conference'},
'cited-by':{$push:{'dc:identifier':"$children-id"}}
}}
]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );
In the query I want to replace the array from $in with the array created with $push
Please try this one through aggregation
> db.collections
.aggregate([
// 1. get the size of `children-id` array through $project
{$project: {
conference: 1,
IEEE1: 1,
'prism:aggregationType': 1,
'children-id': {$size: '$children-id'}
}},
// 2. group by `conference` and `prism:aggregationType` and sum the size of `children-id`
{$group: {
_id: {
conference:'$conference',
aggregationType: '$prism:aggregationType'
},
ids: {$sum: '$children-id'}
}},
// 3. group by `conference`, and make pair of the conference processing ids size and journal ids size
{$group: {
_id: '$_id.conference',
aggregationTypes: {
$cond: [{$eq: ['$_id.aggregationType', 'Conference Proceeding']},
{$push: {"Conference Proceeding": '$ids'}},
{$push: {"Journal": '$ids'}}
]}
}}
]);
As we had a chat,
using $lookup in aggregation pipeline is unfortunately bonded to mongodb 3.2 which is not a case as R driver can use mongo 2.6 and source documents are in more than one collection.
The code I wrote in the EDIT section is also the final result I come up with(a little bit modified)
db.articles.aggregate([
{ $match:{
conference : {$ne : null}
}},
{$unwind:'$children-id'},
{$group: {
_id: '$conference',
'cited-by':{$push:"$children-id"}
}}
]);
db.articles.find( { 'dc:identifier': { $in: [ 'SCOPUS_ID:84943302953', 'SCOPUS_ID:84927603733'] } }, {'prism:aggregationType':1} );
The result will look like this for each conference:
{
"_id" : "Annual Conference on Privacy, Security and Trust",
"cited-by" : [
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84928151617",
"SCOPUS_ID:84939229259",
"SCOPUS_ID:84946407175",
"SCOPUS_ID:84933039513",
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84942607254",
"SCOPUS_ID:84948165954",
"SCOPUS_ID:84926379258",
"SCOPUS_ID:84946771354",
"SCOPUS_ID:84944223683",
"SCOPUS_ID:84942789431",
"SCOPUS_ID:84939169499",
"SCOPUS_ID:84947104346",
"SCOPUS_ID:84948764343",
"SCOPUS_ID:84938075139",
"SCOPUS_ID:84946196118",
"SCOPUS_ID:84930820238",
"SCOPUS_ID:84947785321",
"SCOPUS_ID:84933496680",
"SCOPUS_ID:84942789431"
]
}
I iterate through all the documents I get (around 250) and then I use the cited-by array inside $in. I use index over dc:identifier so it works instantly.
$lookup could be an alternative to get the things done from aggregate pipeline but packages in R does not support versions above 2.6.
Thank you for your time anyway :)

Mongodb: Trying to find all documents with specific subdocument field, why is my query not working?

Here is an example of a document from the collection I am querying
meteor:PRIMARY> db.research.findOne({_id: 'Z2zzA7dx6unkzKiSn'})
{
"_id" : "Z2zzA7dx6unkzKiSn",
"_userId" : "NtE3ANq2b2PbWSEqu",
"collaborators" : [
{
"userId" : "aTPzFad8DdFXxRrX4"
}
],
"name" : "new one",
"pending" : {
"collaborators" : [ ]
}
}
I want to find all documents within this collection with either _userId: 'aTPzFad8DdFXxRrX4' or from the collaborators array, userId: 'aTPzFad8DdFXxRrX4'
So I want to look though the collection and check if the _userId field is 'aTPzFad8DdFXxRrX4'. If not then check the collaborators array on the document and check if there is an object with userId: 'aTPzFad8DdFXxRrX4'.
Here is the query I am trying to use:
db.research.find({$or: [{_userId: 'aTPzFad8DdFXxRrX4'}, {collaborators: {$in: [{userId: 'aTPzFad8DdFXxRrX4'}]}}] })
It does not find the document and gives me a syntax error. What is my issue here? Thanks
The $in operator is basically a simplified version of $or but you really only have one argument here so you should not even need it. Use dot notation instead:
db.research.find({
'$or': [
{ '_userId': 'aTPzFad8DdFXxRrX4'},
{ 'collaborators.userId': 'aTPzFad8DdFXxRrX4'}
]
})
If you need more than one value then use $in:
db.research.find({
'$or': [
{ '_userId': 'aTPzFad8DdFXxRrX4'},
{ 'collaborators.userId': {
'$in': ['aTPzFad8DdFXxRrX4','aTPzFad8DdFXxRrX5']
}}
]
})

How to remove duplicate entries from an array?

In the following example, "Algorithms in C++" is present twice.
The $unset modifier can remove a particular field but how to remove an entry from a field?
{
"_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
"favorites" : {
"books" : [
"Algorithms in C++",
"The Art of Computer Programming",
"Graph Theory",
"Algorithms in C++"
]
},
"name" : "robert"
}
As of MongoDB 2.2 you can use the aggregation framework with an $unwind, $group and $project stage to achieve this:
db.users.aggregate([{$unwind: '$favorites.books'},
{$group: {_id: '$_id',
books: {$addToSet: '$favorites.books'},
name: {$first: '$name'}}},
{$project: {'favorites.books': '$books', name: '$name'}}
])
Note the need for the $project to rename the favorites field, since $group aggregate fields cannot be nested.
The easiest solution is to use setUnion (Mongo 2.6+):
db.users.aggregate([
{'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])
Another (more lengthy) version that is based on the idea from #kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):
> db.users.aggregate([
{'$unwind': {
'path': '$favorites.books',
// output the document even if its list of books is empty
'preserveNullAndEmptyArrays': true
}},
{'$group': {
'_id': '$_id',
'books': {'$addToSet': '$favorites.books'},
// arbitrary name that doesn't exist on any document
'_other_fields': {'$first': '$$ROOT'},
}},
{
// the field, in the resulting document, has the value from the last document merged for the field. (c) docs
// so the new deduped array value will be used
'$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
},
// this stage wouldn't be necessary if the field wasn't nested
{'$addFields': {'favorites.books': '$books'}},
{'$project': {'_other_fields': 0, 'books': 0}}
])
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" :
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }
What you have to do is use map reduce to detect and count duplicate tags .. then use $set to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
This has been discussed sevel times here .. please seee
Removing duplicate records using MapReduce
Fast way to find duplicates on indexed column in mongodb
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
How to remove duplicate record in MongoDB by MapReduce?
function unique(arr) {
var hash = {}, result = [];
for (var i = 0, l = arr.length; i < l; ++i) {
if (!hash.hasOwnProperty(arr[i])) {
hash[arr[i]] = true;
result.push(arr[i]);
}
}
return result;
}
db.collection.find({}).forEach(function (doc) {
db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to remove duplicates from an array:
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory",
// "Algorithms in C++"
// ]},
// "name" : "robert"
// }
db.collection.aggregate(
{ $set:
{ "favorites.books":
{ $function: {
body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
args: ["$favorites.books"],
lang: "js"
}}
}
}
)
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory"
// ]},
// "name" : "robert"
// }
This has the advantages of:
keeping the original order of the array (if that's not a requirement, then prefer #Dennis Golomazov's $setUnion answer)
being more efficient than a combination of expensive $unwind and $group stages.
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify.
args, which contains the fields from the record that the body function takes as parameter. In our case "$favorites.books".
lang, which is the language in which the body function is written. Only js is currently available.