Elegantly return only Subdocuments satisfying elemMatch in MongoDb aggregation result [duplicate] - mongodb

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I've tried several ways of creating an aggregation pipeline which returns just the matching entries from a document's embedded array and not found any practical way to do this.
Is there some MongoDB feature which would avoid my very clumsy and error-prone approach?
A document in the 'workshop' collection looks like this...
{
"_id": ObjectId("57064a294a54b66c1f961aca"),
"type": "normal",
"version": "v1.4.5",
"invitations": [],
"groups": [
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator"
},
{
"_id": ObjectId("57064a294a54b66c1f961acc"),
"role": "contributor"
},
{
"_id": ObjectId("57064a294a54b66c1f961acd"),
"role": "broadcaster"
},
{
"_id": ObjectId("57064a294a54b66c1f961acf"),
"role": "facilitator"
}
]
}
Each entry in the groups array provides a unique ID so that a group member is assigned the given role in the workshop when they hit a URL with that salted ID.
Given a _id matching an entry in a groups array like ObjectId("57064a294a54b66c1f961acb"), I need to return a single record like this from the aggregation pipeline - basically returning the matching entry from the embedded groups array only.
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator",
"workshopId": ObjectId("57064a294a54b66c1f961aca")
},
In this example, the workshopId has been added as an extra field to identify the parent document, but the rest should be ALL the fields from the original group entry having the matching _id.
The approach I have adopted can just about achieve this but has lots of problems and is probably inefficient (with repetition of the filter clause).
return workshopCollection.aggregate([
{$match:{groups:{$elemMatch:{_id:groupId}}}},
{$unwind:"$groups"},
{$match:{"groups._id":groupId}},
{$project:{
_id:"$groups._id",
role:"$groups.role",
workshopId:"$_id",
}},
]).toArray();
Worse, since it explicitly includes named fields from the entry, it will omit any future fields which are added to the records. I also can't generalise this lookup operation to the case of 'invitations' or other embedded named arrays unless I can know what the array entries' fields are in advance.
I have wondered if using the $ or $elemMatch operators within a $project stage of the pipeline is the right approach, but so far they have either been either ignored or triggered operator validity errors when running the pipeline.
QUESTION
Is there another aggregation operator or alternative approach which would help me with this fairly mainstream problem - to return only the matching entries from a document's array?

The implementation below can handle arbitrary queries, serves results as a 'top-level document' and avoids duplicate filtering in the pipeline.
function retrieveArrayEntry(collection, arrayName, itemMatch){
var match = {};
match[arrayName]={$elemMatch:itemMatch};
var project = {};
project[arrayName+".$"] = true;
return collection.findOne(
match,
project
).then(function(doc){
if(doc !== null){
var result = doc[arrayName][0];
result._docId = doc._id;
return result;
}
else{
return null;
}
});
}
It can be invoked like so...
retrieveArrayEntry(workshopCollection, "groups", {_id:ObjectId("57064a294a54b66c1f961acb")})
However, it relies on the collection findOne(...) method instead of aggregate(...) so will be limited to serving the first matching array entry from the first matching document. Projections referencing an array match clause are apparently not possible through aggregate(...) in the same way they are through findXXX() methods.
A still more general (but confusing and inefficient) implementation allows retrieval of multiple matching documents and subdocuments. It works around the difficulty MongoDb has with syntax consistency of Document and Subdocument matching through the unpackMatch method, so that an incorrect 'equality' criterion e.g. ...
{greetings:{_id:ObjectId("437908743")}}
...gets transferred into the required syntax for a 'match' criterion (as discussed at Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING )...
{"greetings._id":ObjectId("437908743")}
Leading to the following implementation...
function unpackMatch(pathPrefix, match){
var unpacked = {};
Object.keys(match).map(function(key){
unpacked[pathPrefix + "." + key] = match[key];
})
return unpacked;
}
function retrieveArrayEntries(collection, arrayName, itemMatch){
var matchDocs = {},
projectItems = {},
unwindItems = {},
matchUnwoundByMap = {};
matchDocs.$match={};
matchDocs.$match[arrayName]={$elemMatch:itemMatch};
projectItems.$project = {};
projectItems.$project[arrayName]=true;
unwindItems.$unwind = "$" + arrayName;
matchUnwoundByMap.$match = unpackMatch(arrayName, itemMatch);
return collection.aggregate([matchDocs, projectItems, unwindItems, matchUnwoundByMap]).toArray().then(function(docs){
return docs.map(function(doc){
var result = doc[arrayName];
result._docId = doc._id;
return result;
});
});
}

Related

Mongoose Updating an array in multiple documents by passing an array of filters to update query

I have multiple documents(3 documents in this example) in one collection that looks like this:
{
_id:123,
bizs:[{_id:'',name:'a'},{_id:'',name:'b'}]
},
{
_id:456,
bizs:[{_id:'',name:'e'},{_id:'',name:'f'}]
}
{
_id:789,
bizs:[{_id:'',name:'x'},{_id:'',name:'y'}]
}
Now, I want to update the bizs subdocument by matching with my array of ids.
That is to say, my array filter for update query is [123,789], which will match against the _id fields of each document.
I have tried using findByIdAndUpdate() but that doesn't allow an array for the update query
How can I update the 2 matching documents (like my example above) without having to put findByIdAndUpdate inside a forloop to match the array element with the _id?
You can not use findByIdAndUpdate when updating multiple documents, findByIdAndUpdate is from mongoose which is a wrapper to native MongoDB's findOneAndUpdate. When you pass a single string as a filter to findByIdAndUpdate like : Collection.findByIdAndUpdate({'5e179dac627ef7823643cd97'}, {}) - then mongoose will internally convert string to ObjectId() & form it as a filter like :_id : ObjectId('5e179dac627ef7823643cd97') to execute findOneAndUpdate. So it means you can only update one document at a time, So if you've multiple documents to be updated use update with option {multi : true} or updateMany.
Assume if you wanted to push a new object to bizs, this is how query looks like :
collection.updateMany({ _id: { $in: [123, 456] } }, {
$push: {
bizs: {
"_id": "",
"name": "new"
}
}
})
Note : Update operations doesn't return the documents in response rather they will return write result which has information about n docs matched & n docs modified.

MongoDB: Several fields to a list

I currently have a collection that follows a format like this:
{ "_id": ObjectId(...),
"name" : "Name",
"red": 0,
"blue": 0,
"yellow": 1,
"green": 0,
...}
and so on (a bunch of colors). What I would like to do is to create a new array named colors, whose elements are those colors that have a value of 1.
For example:
{ "_id": ObjectId(...),
"name" : "Name",
"colors": ["yellow"]
}
Is this something I can do on the Mongo shell? Or should I do it in a program?
I'm pretty sure I can do it using Python, however I am having difficulties trying to do it directly in the shell. If it can be done in the shell, can anyone point me in the right direction?
Thanks.
Yes it can be easily done in the shell, or basically by following the example adapted into any language.
The key here is to look at the fields that are "colors" then contruct an update statement that both removes those fields from the document while testing them to see if they are valid for inclusion into the array, then of course adding that to the document update as well:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find().forEach(function(doc) {
doc.colors = doc.colors || [];
var update = { "$unset": {}};
Object.keys(doc).filter(function(key) {
return !/^_id|name|colors/.test(key)
}).forEach(function(key) {
update.$unset[key] = "";
if ( doc[key] == 1)
doc.colors.push(key);
});
update["$addToSet"] = { "colors": { "$each": doc.colors } };
bulk.find({ "_id": doc._id }).updateOne(update);
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp()
}
});
if ( count % 1000 != 0 )
bulk.execute();
The Bulk Operations usage means that batches of updates are sent rather than one request and response per document, so this will process a lot faster than merely issuing singular updates back and forth.
The main operators here are $unset to remove the existing fields and $addToSet to add the new evaluated array. Both are built up by cycling the keys of the document that make up the possible colors and excluding the other keys you don't want to modify using a regex filter.
Also using $addToSet and this line:
doc.colors = doc.colors || [];
with the purpose of being sure that if any document was already partially converted or otherwise touched by a code change that had already started storing the correct array, then these would not be adversely affected or overwritten by the update process.
tl;dr, spoiler
Mongodb's shell has access to some javascript-like methods on their objects. You can query your collection with db.yourCollectionName.find() which will return a cursor (cursor methods). Then iterate through to get each document, iterate through the keys, conditionally filter out keys like _id and name and then check to see if the value is 1, store that key somewhere in a collection.
Once done, you'd probably want to use db.yourCollectionName.update() or db.yourCollectionName.findAndModify() to find the record by _id and use $set to add a new field and set it's value to the collection of keys.

Update nested array document

say i have this model
{
_id : 1,
ref: '1',
children: [
{
ref:'1.1',
grandchildren: [
{
ref:'1.1.1',
visible: true;
}
]
}
]
}
I'm aware that positional operator for nested arrays isn't available yet.
https://jira.mongodb.org/browse/SERVER-831
but wondered whether its possible to atomically update the document in the nested array?
In my example, i'd like to update the visible flag to false for the document for ref 1.1.1.
I have the children record ref == '1.1' and the grandchildrenref == '1.1.1'
thanks
Yes, this is possible only if you knew the index of the children array that has the grandchildren object to be updated beforehand and the update query will use the positional operator as follows:
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
{
"$set": {
"children.0.grandchildren.$.visible": false
}
}
)
However, if you don't know the array index positions beforehand, you should consider creating the $set conditions dynamically by using MapReduce. The basic idea with MapReduce is that it uses JavaScript as its query language but this tends to be fairly slower than the aggregation framework and not recommended for use in real-time data analysis.
In your MapReduce operation, you need to define a couple of steps i.e. the mapping step (which maps an operation into every document in the collection, and the operation can either do nothing or emit some object with keys and projected values) and reducing step (which takes the list of emitted values and reduces it to a single element).
For the map step, you ideally would want to get for every document in the collection, the index for each children array field and another key that contains the $set keys.
Your reduce step would be a function (which does nothing) simply defined as var reduce = function() {};
The final step in your MapReduce operation will then create a separate collection operations that contains the emitted operations array object along with a field with the $set conditions. This collection can be updated periodically when you run the MapReduce operation on the original collection.
Altogether, this MapReduce method would look like:
var map = function(){
for(var i = 0; i < this.children.length; i++){
emit(
{
"_id": this._id,
"index": i
},
{
"index": i,
"children": this.children[i],
"update": {
"ref": "children." + i.toString() + ".grandchildren.$.ref",
"visible": "children." + i.toString() + ".grandchildren.$.visible"
}
}
);
}
};
var reduce = function(){};
db.collection.mapReduce(
map,
reduce,
{
"out": {
"replace": "update_collection"
}
}
);
You can then use the cursor from the db.update_collection.find() method to iterate over and update your collection accordingly:
var cur = db.update_collection.find(
{
"value.children.ref": "1.1",
"value.children.grandchildren.ref": "1.1.1"
}
);
// Iterate through results and update using the update query object set dynamically by using the array-index syntax.
while (cur.hasNext()) {
var doc = cur.next();
var update = { "$set": {} };
// set the update query object
update["$set"][doc.value.update.visible] = false;
db.collection.update(
{
"children.ref": "1.1",
"children.grandchildren.ref": "1.1.1"
},
update
);
};

MongoDB - Aggregation on referenced field

I've got a question on the design of documents in order to be able to efficiently perform aggregation. I will take a dummy example of document :
{
product: "Name of the product",
description: "A new product",
comments: [ObjectId(xxxxx), ObjectId(yyyy),....]
}
As you could see, I have a simple document which describes a product and wraps some comments on it. Imagine this product is very popular so that it contains millions of comments. A comment is a simple document with a date, a text and eventually some other features. The probleme is that such a product can easily be larger than 16MB so I need not to embed comments in the product but in a separate collection.
What I would like to do now, is to perform aggregation on the product collection, a first step could be for example to select various products and sort the comments by date. It is a quite easy operation with embedded documents, but how could I do with such a design ? I only have the ObjectId of the comments and not their content. Of course, I'd like to perform this aggregation in a single operation, i.e. I don't want to have to perform the first part of the aggregation, then query the results and perform another aggregation.
I dont' know if that's clear enough ? ^^
I would go about it this way: create a temp collection that is the exact copy of the product collection with the only exception being the change in the schema on the comments array, which would be modified to include a comment object instead of the object id. The comment object will only have the _id and the date field. The above can be done in one step:
var comments = [];
db.product.find().forEach( function (doc){
doc.comments.forEach( function(x) {
var obj = {"_id": x };
var comment = db.comment.findOne(obj);
obj["date"] = comment.date;
comments.push(obj);
});
doc.comments = comments;
db.temp.insert(doc);
});
You can then run your aggregation query against the temp collection:
db.temp.aggregate([
{
$match: {
// your match query
}
},
{
$unwind: "$comments"
},
{
$sort: { "comments.date": 1 } // sort the pipeline by comments date
}
]);

Mongo: Query to get count of objects in a nested field

I needed some help to create a count query on nested objects in a field, across all documents. Each document json has a many fields. One particular field called "hotlinks" comprises of many internal dynamic object fields.
Doc1:
{
hotlinks : { 112222:{....} , 333333: {.....} , 545555: {.....} }
}
Doc2:
{
hotlinks : { 67756:{....} , 756767: {.....} , 1111111: {.....} }
}
Each document has a hotlinks fields. The hotlinks field comprises of varied inner hotlink objects. Each key is a java unique id and has objects that contain data (inner fields).
I needed a way to get the count of all the inner nested objects of the field – ‘hotlinks’.
For example the summation of inner objects of hotlinks in doc1 and doc2 would be 6.
Is there any way to do this via a single query to get the count across all documents.
Thanks a lot,
Karan
Quite possible if using MongoDB 3.6 and newer though the aggregation framework.
Use the $objectToArray operator within an aggregation pipeline to convert the document to an array. The return array contains an element for each field/value pair in the original document. Each element in the return array is a document that contains two fields k and v.
On getting the array, you can then leverage the use of the $size operator which returns the number of elements in the given array thus giving you the count per document.
Getting the count across all the documents requires a $group pipeline where you specify the _id key of null or a constant value which gives calculates accumulated values for all the input documents as a whole.
All this can be done in a single pipeline by nesting the expressions as follows:
db.collection.aggregate([
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$size": { "$objectToArray": "$hotlinks" }
}
}
} }
])
Example Output
{
"_id" : null,
"count" : 6
}
this may not be the best approach, but you can define a javascript variable and sum up the counts. i.e;
var hotlinkTotal=0;
db.collection.find().forEach(function(x){hotlinkTotal+=x.hotlinks.length;});
print(hotlinkTotal);