My collection documents are:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"},
{"name":"bananas"} ],
}
{
"_id" : 2,
"fruits" : [ {"name":"bananas"} ],
}
I need to remove the whole document when the fruits contains only "bananas" or only remove the fruit "bananas" when there are more than one fruit in the fruits array.
My final collection after running the required query should be:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"}],
}
I am currently using two queries to get this done:
db.collection.remove({'fruits':{$size:1, $elemMatch:{'name': 'bananas'} }}) [this will remove the document when only one fruit present]
and
db.collection.update({},{$pull:{'fruits':{'name':'bananas'}}},{multi: true}) [this will remove the entry 'bananas' from the array]
Is there any way to combine these into one query?
EDIT: Final take
-- I guess there is no "one query" to perform the above tasks since the intents are very different of both the actions.
-- The best that can be performed is: club the actions into a bulk_write query which saves on the network I/O(as suggested in the answer by Neil). This is believe is more beneficial when you have multiple such actions being fired. Also, bulk_write can provide the feature of locking in the sense that the "ordered" mode of the bulk_write makes the actions sequential, breaking and halting execution in case of error.
Hence bulk_write is more beneficial when the actions performed need to be sequential. Somewhat like "chaining" in JS. There is also the option to perform un-ordered bulk_writes.
Also, the actions specified in the bulk write, operate on the collection level as individual actions.
You basically want bulk_write() here to do them both. Also Use $exists to ensure there's only one element:
from pymongo import UpdateMany, DeleteMany
db.collection.bulk_write(
[
UpdateMany(
{ "fruits.1": { "$exists": True }, "fruits.name": "bananas" },
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
),
DeleteMany(
{ "fruits.1": { "$exists": False }, "fruits.name": "bananas" }
)
],
ordered=False
)
You don't really need $elemMatch for "one" condition and you should be using update_many() and in this case UpdateMany() instead of { "multi": true }. And that option is different in "pymongo" anyway. Then of course there is delete_many() or DeleteMany() for the "bulk" context.
Bulk operations send one request with one response, which is better than sending multiple requests. Also "update" and "delete" are two different things, but the single request can combine just like this.
The $size operator is valid but $exists can apply to a "range" where $size cannot, so it's generally a bit more flexible.
i.e Just as a $exists range example
# Array between 2 and 4 elements
db.collection.update_many(
{
"fruits.1": { "$exists": True },
"fruits.4": { "$exists": False },
"fruits.name": "bananas"
},
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
)
And of course in the context here you actually want to know the difference between other possible things in the array and those with "only" a single "bananas".
The ordered=False here actually refers to two different ways that "bulk write" requests can be handled
Ordered - Where True ( which is the "default" ) then the operations are executed in "serial order" as they appear in the array of operations sent with the "bulk write". If any error occurs here then the batch stops execution at the point of the error and returns an exception.
UnOrdered - Where False the operations are executed in "parallel" within reasonable constraints on the server. If any error occurs there is still an exception raised, however this does not stop other operations within the "bulk write" from completing. Any errors are returned with the "array index" from the list provided to the command of which operation caused the error.
This option can used to "tune" the desired behavior in particular to error reporting and continuation, and also allows a degree of "parallelism" to the execution where "serial" is not actually required of the operations. Since these two statements do not actually depend on one or the other and will in fact select different documents anyway, then ordered=False is probably the better option in terms of efficiency here.
db.users.aggregate(
[{
$project: {
data: {
$filter: {
input: "$fruits",
as: "filterData",
cond: { $ne: [ "$$filterData.name", 'bananas' ] }
}
}
}
},
{
$unwind: {
path : "$data",
preserveNullAndEmptyArrays : false
}
},
{
$group: {
_id:"$_id",
data: { $addToSet: "$data" }
}
},
])
I think above query would give you perfect results
Related
I am trying to update a value in the nested array but can't get it to work.
My object is like this
{
"_id": {
"$oid": "1"
},
"array1": [
{
"_id": "12",
"array2": [
{
"_id": "123",
"answeredBy": [], // need to push "success"
},
{
"_id": "124",
"answeredBy": [],
}
],
}
]
}
I need to push a value to "answeredBy" array.
In the below example, I tried pushing "success" string to the "answeredBy" array of the "123 _id" object but it does not work.
callback = function(err,value){
if(err){
res.send(err);
}else{
res.send(value);
}
};
conditions = {
"_id": 1,
"array1._id": 12,
"array2._id": 123
};
updates = {
$push: {
"array2.$.answeredBy": "success"
}
};
options = {
upsert: true
};
Model.update(conditions, updates, options, callback);
I found this link, but its answer only says I should use object like structure instead of array's. This cannot be applied in my situation. I really need my object to be nested in arrays
It would be great if you can help me out here. I've been spending hours to figure this out.
Thank you in advance!
General Scope and Explanation
There are a few things wrong with what you are doing here. Firstly your query conditions. You are referring to several _id values where you should not need to, and at least one of which is not on the top level.
In order to get into a "nested" value and also presuming that _id value is unique and would not appear in any other document, you query form should be like this:
Model.update(
{ "array1.array2._id": "123" },
{ "$push": { "array1.0.array2.$.answeredBy": "success" } },
function(err,numAffected) {
// something with the result in here
}
);
Now that would actually work, but really it is only a fluke that it does as there are very good reasons why it should not work for you.
The important reading is in the official documentation for the positional $ operator under the subject of "Nested Arrays". What this says is:
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
Specifically what that means is the element that will be matched and returned in the positional placeholder is the value of the index from the first matching array. This means in your case the matching index on the "top" level array.
So if you look at the query notation as shown, we have "hardcoded" the first ( or 0 index ) position in the top level array, and it just so happens that the matching element within "array2" is also the zero index entry.
To demonstrate this you can change the matching _id value to "124" and the result will $push an new entry onto the element with _id "123" as they are both in the zero index entry of "array1" and that is the value returned to the placeholder.
So that is the general problem with nesting arrays. You could remove one of the levels and you would still be able to $push to the correct element in your "top" array, but there would still be multiple levels.
Try to avoid nesting arrays as you will run into update problems as is shown.
The general case is to "flatten" the things you "think" are "levels" and actually make theses "attributes" on the final detail items. For example, the "flattened" form of the structure in the question should be something like:
{
"answers": [
{ "by": "success", "type2": "123", "type1": "12" }
]
}
Or even when accepting the inner array is $push only, and never updated:
{
"array": [
{ "type1": "12", "type2": "123", "answeredBy": ["success"] },
{ "type1": "12", "type2": "124", "answeredBy": [] }
]
}
Which both lend themselves to atomic updates within the scope of the positional $ operator
MongoDB 3.6 and Above
From MongoDB 3.6 there are new features available to work with nested arrays. This uses the positional filtered $[<identifier>] syntax in order to match the specific elements and apply different conditions through arrayFilters in the update statement:
Model.update(
{
"_id": 1,
"array1": {
"$elemMatch": {
"_id": "12","array2._id": "123"
}
}
},
{
"$push": { "array1.$[outer].array2.$[inner].answeredBy": "success" }
},
{
"arrayFilters": [{ "outer._id": "12" },{ "inner._id": "123" }]
}
)
The "arrayFilters" as passed to the options for .update() or even
.updateOne(), .updateMany(), .findOneAndUpdate() or .bulkWrite() method specifies the conditions to match on the identifier given in the update statement. Any elements that match the condition given will be updated.
Because the structure is "nested", we actually use "multiple filters" as is specified with an "array" of filter definitions as shown. The marked "identifier" is used in matching against the positional filtered $[<identifier>] syntax actually used in the update block of the statement. In this case inner and outer are the identifiers used for each condition as specified with the nested chain.
This new expansion makes the update of nested array content possible, but it does not really help with the practicality of "querying" such data, so the same caveats apply as explained earlier.
You typically really "mean" to express as "attributes", even if your brain initially thinks "nesting", it's just usually a reaction to how you believe the "previous relational parts" come together. In reality you really need more denormalization.
Also see How to Update Multiple Array Elements in mongodb, since these new update operators actually match and update "multiple array elements" rather than just the first, which has been the previous action of positional updates.
NOTE Somewhat ironically, since this is specified in the "options" argument for .update() and like methods, the syntax is generally compatible with all recent release driver versions.
However this is not true of the mongo shell, since the way the method is implemented there ( "ironically for backward compatibility" ) the arrayFilters argument is not recognized and removed by an internal method that parses the options in order to deliver "backward compatibility" with prior MongoDB server versions and a "legacy" .update() API call syntax.
So if you want to use the command in the mongo shell or other "shell based" products ( notably Robo 3T ) you need a latest version from either the development branch or production release as of 3.6 or greater.
See also positional all $[] which also updates "multiple array elements" but without applying to specified conditions and applies to all elements in the array where that is the desired action.
I know this is a very old question, but I just struggled with this problem myself, and found, what I believe to be, a better answer.
A way to solve this problem is to use Sub-Documents. This is done by nesting schemas within your schemas
MainSchema = new mongoose.Schema({
array1: [Array1Schema]
})
Array1Schema = new mongoose.Schema({
array2: [Array2Schema]
})
Array2Schema = new mongoose.Schema({
answeredBy": [...]
})
This way the object will look like the one you show, but now each array are filled with sub-documents. This makes it possible to dot your way into the sub-document you want. Instead of using a .update you then use a .find or .findOne to get the document you want to update.
Main.findOne((
{
_id: 1
}
)
.exec(
function(err, result){
result.array1.id(12).array2.id(123).answeredBy.push('success')
result.save(function(err){
console.log(result)
});
}
)
Haven't used the .push() function this way myself, so the syntax might not be right, but I have used both .set() and .remove(), and both works perfectly fine.
I need to delete documents but with multiple condition at the same variable.
db.getCollection('system_parameter').remove(
{
"variable":/^pickup_kota/,
"variable":
{
$nin:
[
/^pickup_kota_jakarta/
]
}
}
)
What I'm trying to do is, I want to delete all the data with the same prefix ('pickup_kota'), but excluding the ('pickup_kota_jakarta') documents.
If I execute the query above, ALL of the data is removed including let's say prefix 'some_doc' but excluding 'pickup_kota_jakarta'
All MongoDB query arguments are already AND conditions, so just include them on the same key:
db.getCollection('system_parameter').remove({
"variable": {
"$regex": /^pickup_kota/,
"$nin": [
/^pickup_kota_jakarta/
]
}
})
Or you can always write the "long form" with $and
db.getCollection('system_parameter').remove({
"$and": [
{ "variable": /^pickup_kota/ },
{ "variable": "$nin": [/^pickup_kota_jakarta/] }
]
})
So with two documents like this:
{ "variable" : "pickup_kota_somewhere" }
{ "variable" : "pickup_kota_jakarta" }
Only the first one gets removed
But as long as you can use a different operator such as $regex here to separate the conditions onto keys, then you don't need the full form.
Also since those are both anchored to the start of the string, it's more efficient for MongoDB to do the two comparisons than attempting a regular expression to meet both possible conditions. The only regular expression that could would break that bounding rule, and obviate the efficiency gained by searching anchored to the beginning of the string which can use an index.
You can try $all operator.
db.getCollection('system_parameter').remove({
"variable": {$all: [
{$regex: /^pickup_kota/},
{$nin:[/^pickup_kota_jakarta/]}
]}
})
I have a large collection that I'd like to export to CSV, but I'd like to do some trimming to some of the fields. (e.g. I just need to know the number of elements in some, and just to know if others exist or not in the doc)
I would like to do the equivalent to a map function on the fields, so that fields that contain a list will be exported to the list size, and some fields that sometimes exist and sometimes do not, I would like to have them exported as boolean flags.
e.g. if my rows looks like this
{_id:"id1", listField:[1,2,3], optionalField: "...", ... }
{_id:"id2", listField:[1,2,3,4], ... }
I'd like to run a mongoexport to CSV that will result in this
_id, listField.length, optinalField.exists
"id1", 3, , true
"id2", 4, , false
Is that possible using mongoexport? (assume MongoDB version 3.0)
If not, is there another way to do that?
The mongoexport utility itself is pretty spartan and just a basic tool bundled in the suite. You can add "query" filters, but pretty much just like .find() queries in general, the intention is to return documents "as is" rather than "manipulate" the content.
Just as with other query operations, the .aggregate() method is something useful for document manipulation. So in order to "manipulate" the output to something different from the original document source, you would do:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
])
The $size operator returns the "size" of the array, and the $ifNull tests for the presence, either returning the field value or the alternate. Pass that result into $cond to get a true/false return rather than the field value. "_id" is always implicit, unless you specifically ask to omit it.
That would give you the "reduced" output, but in order to go to CSV then you would have to code that export yourself, as mongoexport does not run aggregation pipeline queries.
But the code to do so should be quite trivial ( pick a library for your language ), and the aggregation statement is also trivial as you can see here.
For the "really basic" approach, then just send a script to the mongo shell, as a very rudimentary form of programming:
db.collection.aggregate([
{ "$project": {
"listField": { "$size": "$listField" },
"optionalField": {
"$cond": [
{ "$ifNull": [ "$optionalField", false ] },
true,
false
]
}
}}
]).forEach(function(doc) {
print(Object.keys(doc).map(function(key) {
return doc[key]
}).join(","));
});
Which would output:
id1,3,true
id2,4,false
I am looking for querying JSON file which has nested array structure. Each design element has multiple SLID and status. I want to write mongodb query to get designs with highest SLID and status as "OLD".
Here is the sample JSON:
{
"_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
"designs" : [
{
"Deid" : 1,
"details" : [
{
"SLID" : 1,
"status" : "OLD"
},
{
"SLID" : 2,
"status" : "NEW"
}
]
},
{
"Deid" : 2,
"details" : [
{
"SLID" : 1,
"status" : "NEW"
},
{
"SLID" : 2,
"status" : "NEW"
},
{
"SLID" : 3,
"status" : "OLD"
}
]
}
]
}
In this sample the expected query should return the following as SLID is highest with status "OLD".
{
"_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
"designs" : [
{
"Deid" : 2,
"details" : [
{
"SLID" : 3,
"status" : "OLD"
}
]
}
]
}
I have tried following query but it kept returning other details array element (which has status "NEW") along with above element.
db.Collection.find({"designs": {$all: [{$elemMatch: {"details.status": "OLD"}}]}},
{"designs.details":{$slice:-1}})
Edit:
To summarize the problem:
Requirement is to get all design from document set with highest SLID (always the last item in details array) if it has status as "OLD".
Present Problem
What you should have been picking up from the previously linked question is that the positional $ operator itself is only capable of matching the first matched element within an array. When you have nested arrays like you do, then this means "always" the "outer" array can only be reported and never the actual matched position within the inner array nor any more than a single match.
Other examples show usage of the aggregation framework for MongoDB in order to "filter" elements from the array by generally processing with $unwind and then using conditions to match the array elements that you require. This is generally what you need to do in this case to get matches from your "inner" array. While there have been improvements since the first answers, your "last match" or effectively a "slice" condition, excludes other present possibilities. Therefore:
db.junk.aggregate([
{ "$match": {
"designs.details.status": "OLD"
}},
{ "$unwind": "$designs" },
{ "$unwind": "$designs.details" },
{ "$group": {
"_id": {
"_id": "$_id",
"Deid": "$designs.Deid"
},
"details": { "$last": "$designs.details"}
}},
{ "$match": {
"details.status": "OLD"
}},
{ "$group": {
"_id": "$_id",
"details": { "$push": "$details"}
}},
{ "$group": {
"_id": "$_id._id",
"designs": {
"$push": {
"Deid": "$_id.Deid",
"details": "$details"
}
}
}}
])
Which would return on your document or any others like it a result like:
{
"_id" : ObjectId("55cddc30f1a3c59ca1e88f30"),
"designs" : [
{
"Deid" : 2,
"details" : [
{
"SLID" : 3,
"status" : "OLD"
}
]
}
]
}
The key there being to $unwind both arrays and then $group back on the relevant unique elements in order to "slice" the $last element from each "inner" array.
The next of your conditions requires a $match in order to see that the "status" field of that element is the value that you want. Then of course since the documents have been essentially "de-normalized" by the $unwind operations and even with the subsequent $group, the following $group statements re-construct the document into it's original form.
Aggregation pipelines can either be quite simple or quite difficult depending on what you want to do, and reconstruction of documents with filtering like this means you need to take care in the steps, particularly if there are other fields involved. As you should also appreciate here, this process of $unwind to de-normalize and $group operations is not very efficient, and can cause significant overhead depending on the number of possible documents that can be met by the initial $match query.
Better Solution
While currently only available in the present development branch, there are some new operators available to the aggregation pipeline that make this much more efficient, and effectively "on par" with the performance of a general query. These are notably the $filter and $slice operators, which can be employed in this case as follows:
db.junk.aggregate([
{ "$match": {
"designs.details.status": "OLD"
}},
{ "$redact": {
"$cond": [
{ "$gt": [
{ "$size":{
"$filter": {
"input": "$designs",
"as": "designs",
"cond": {
"$anyElementTrue":[
{ "$map": {
"input": {
"$slice": [
"$$designs.details",
-1
]
},
"as": "details",
"in": {
"$eq": [ "$$details.status", "OLD" ]
}
}}
]
}
}
}},
0
]},
"$$KEEP",
"$$PRUNE"
]
}},
{ "$project": {
"designs": {
"$map": {
"input": {
"$filter": {
"input": "$designs",
"as": "designs",
"cond": {
"$anyElementTrue":[
{ "$map": {
"input": {
"$slice": [
"$$designs.details",
-1
]
},
"as": "details",
"in": {
"$eq": [ "$$details.status", "OLD" ]
}
}}
]
}
}
},
"as": "designs",
"in": {
"Deid": "$$designs.Deid",
"details": { "$slice": [ "$$designs.details", -1] }
}
}
}
}}
])
This effectively makes the operations just a $match and $project stage only, which is basically what is done with a general .find() operation. The only real addition here is a $redact stage, which allows the documents to be additionally filtered from the initial query condition by futher logical conditions that can inspect the document.
In this case, we can see if the document not only contains an "OLD" status, but also that this is the last element of at least one of the inner arrays matches that status it it's own last entry, otherwise it is "pruned" from the results for not meeting that condition.
In both the $redact and $project, the $slice operator is used to get the last entry from the "details" array within the "designs" array. In the initial case it is applied with $filter to remove any elements where the condition did not match from the "outer" or "designs" array, and then later in the $project to just show the last element from the "designs" array in final presentation. That last "reshape" is done by $map to replace the whole arrays with the last element slice only.
Whilst the logic there seems much more long winded than the initial statement, the performance gain is potentially "huge" due to being able to treat each document as a "unit" without the need to denormalize or otherwise de-construct until the final projection is made.
Best Solution for Now
In summary, the current processes you can use to achieve the result are simply not efficient for solving the problem. It would in fact be more efficient to simply match the documents that meet the basic condition ( contain a "status" that is "OLD" ) in conjuntction with a $where condition to test the last element of each array. However the actual "filtering" of the arrays in output is best left to client code:
db.junk.find({
"designs.details.status": "OLD",
"$where": function() {
return this.designs.some(function(design){
return design.details.slice(-1)[0].status == "OLD";
});
}
}).forEach(function(doc){
doc.designs = doc.designs.filter(function(design) {
return design.details.slice(-1)[0].status == "OLD";
}).map(function(design) {
design.details = design.details.slice(-1);
return design;
});
printjson(doc);
});
So the query condition at least only returns the documents that match all conditions, and then the client side code filters out the content from arrays by testing the last elements and then just slicing out that final element as the content to display.
Right now, that is probably the most efficient way to do this as it mirrors the future aggregation operation capabilties.
The problems are really in the structure of your data. While it may suit your display purposes of your application, the usage of nested arrays makes this notoriously difficult to query as well as "impossible" to atomically update due to the limitations of the positional operator mentioned before.
While you can always $push new elements to the inner array by matching it's existence within or just the presence of the outer array element, what you cannot do is alter that "status" of an inner element in an atomic operation. In order to modify in such a way, you need to retrieve the entire document, then modifiy the contents in code, and save back the result.
The problems with that process mean you are likely to "collide" on updates and possibly overwrite the changes made by another concurrent request to the one you are currently processing.
For these reasons you really should reconsider all of your design goals for the application and the suitability to such a structure. Keeping a more denormalized form may cost you in some other areas, but it sure would make things much more simple to both query and update with the kind of inspection level this seems to require.
The end conclusion here should be that you reconsider the design. Though getting your results is both possible now and in the future, the other operational blockers should be enough to warrant a change.
Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?
Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.
db.system.js.save({
"_id": "squareThis",
"value": function(a) { return a*a }
})
And some data inserted to "sample" collection:
{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
Then:
db.sample.mapReduce(
function() {
emit(null, squareThis(this.a));
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
);
Gives:
"results" : [
{
"_id" : null,
"value" : 14
}
],
Or with $where:
db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".
Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.
The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:
db.sample.aggregate([
{ "$match": {
"a": { "$in": db.sample.distinct("a") }
}}
])
So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:
var items = [1,2,3];
db.sample.aggregate([
{ "$match": {
"a": { "$in": items }
}}
])
Both essentially send to the server in the same way:
db.sample.aggregate([
{ "$match": {
"a": { "$in": [1,2,3] }
}}
])
So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.
With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:
db.sample.aggregate([
{ "$group": {
"_id": null,
"sqared": { "$sum": {
"$multiply": [ "$a", "$a" ]
}}
}}
])
{ "_id" : null, "sqared" : 14 }
So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:
Not allowed, such as accessing data from another collection
Not really required as the logic is generally self contained anyway
Or probably better implemented in client logic or other different form anyway
Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.
But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.
So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.